Shieldmaidens Wiki
shieldmaidenswiki
https://shieldmaidens.your.wf/Main_Page
MediaWiki 1.40.1
first-letter
Media
Special
Talk
User
User talk
Shieldmaidens Wiki
Shieldmaidens Wiki talk
File
File talk
MediaWiki
MediaWiki talk
Template
Template talk
Help
Help talk
Category
Category talk
Module
Module talk
Main Page
0
1
1
2023-11-05T00:07:27Z
MediaWiki default
1
Create main page
wikitext
text/x-wiki
__NOTOC__
Welcome to your new wiki! You may find the following link useful: [[mw:Help:Contents|Help:Contents]] on mediawiki.org.
2c08c64783dc4305c6dbf13e156368bfa0aed08c
2
1
2023-11-05T00:24:14Z
Sienna
2
wikitext
text/x-wiki
== Welcome! ==
This is the community-driven documentation wiki for Nova, Pleiades and their supporting code bases 🎉
=== Nova Engine ===
The [[Nova Engine]] is a bleeding edge R&D effort to create a next generation distributed game engine. It is still heavily under construction as of October 2023, and large swaths of it haven't been documented yet. Even larger sections haven't even been created yet. It is built on top of the [[Pleiades Supercompute Fabric]].
=== Pleiades Supercompute Fabric ===
The [[Pleiades Supercompute Fabric]], or just Pleiades, is a lightning-fast key-value store, written in Rust. It's an exercise in high-performance data storage at scale, on the order of exabytes.
== Getting started ==
Here you can learn more about the project, the vision, and some other key information about the overall project.
* [[Nova Engine|What is the Nova Engine?]]
* [[Development Environment]]
768590436fe939da0866de00a8ca6960d4d3ba5a
8
2
2023-11-05T01:06:56Z
Sienna
2
/* Getting started */
wikitext
text/x-wiki
== Welcome! ==
This is the community-driven documentation wiki for Nova, Pleiades and their supporting code bases 🎉
=== Nova Engine ===
The [[Nova Engine]] is a bleeding edge R&D effort to create a next generation distributed game engine. It is still heavily under construction as of October 2023, and large swaths of it haven't been documented yet. Even larger sections haven't even been created yet. It is built on top of the [[Pleiades Supercompute Fabric]].
=== Pleiades Supercompute Fabric ===
The [[Pleiades Supercompute Fabric]], or just Pleiades, is a lightning-fast key-value store, written in Rust. It's an exercise in high-performance data storage at scale, on the order of exabytes.
== Getting started ==
Here you can learn more about the project, the vision, and some other key information about the overall project.
* [[Nova Engine|What is the Nova Engine?]]
* [[Development Environment]]
* [[Constellation Mesh|What is a constellation mesh?]]
42864151ccedd3f9030dca6693792e49e7915572
13
8
2023-11-05T02:16:50Z
Sienna
2
/* Getting started */
wikitext
text/x-wiki
== Welcome! ==
This is the community-driven documentation wiki for Nova, Pleiades and their supporting code bases 🎉
=== Nova Engine ===
The [[Nova Engine]] is a bleeding edge R&D effort to create a next generation distributed game engine. It is still heavily under construction as of October 2023, and large swaths of it haven't been documented yet. Even larger sections haven't even been created yet. It is built on top of the [[Pleiades Supercompute Fabric]].
=== Pleiades Supercompute Fabric ===
The [[Pleiades Supercompute Fabric]], or just Pleiades, is a lightning-fast key-value store, written in Rust. It's an exercise in high-performance data storage at scale, on the order of exabytes.
== Getting started ==
Here you can learn more about the project, the vision, and some other key information about the overall project.
* [[Nova Engine|What is the Nova Engine?]]
* [[Contributing|Development Environment]]
* [[Constellation Mesh|What is a constellation mesh?]]
e31f6398a4ff9d44f7fbf14571b7e23635c05fcb
14
13
2023-11-05T02:17:37Z
Sienna
2
/* Nova Engine */
wikitext
text/x-wiki
== Welcome! ==
This is the community-driven documentation wiki for Nova, Pleiades and their supporting code bases 🎉
=== Nova Engine ===
The [[Nova Engine]] is a bleeding edge R&D effort to create a next generation distributed simulation engine (re: game engine). It is still heavily under construction as of October 2023, and large swaths of it haven't been documented yet. Even larger sections haven't even been created yet. It is built on top of the [[Pleiades Supercompute Fabric]].
=== Pleiades Supercompute Fabric ===
The [[Pleiades Supercompute Fabric]], or just Pleiades, is a lightning-fast key-value store, written in Rust. It's an exercise in high-performance data storage at scale, on the order of exabytes.
== Getting started ==
Here you can learn more about the project, the vision, and some other key information about the overall project.
* [[Nova Engine|What is the Nova Engine?]]
* [[Contributing|Development Environment]]
* [[Constellation Mesh|What is a constellation mesh?]]
9c1b714cadfae4525b6e96d2f607b932284c1b1e
500
14
2023-11-05T03:01:34Z
Sienna
2
added discord
wikitext
text/x-wiki
== Welcome! ==
This is the community-driven documentation wiki for Nova, Pleiades and their supporting code bases 🎉
=== Nova Engine ===
The [[Nova Engine]] is a bleeding edge R&D effort to create a next generation distributed simulation engine (re: game engine). It is still heavily under construction as of October 2023, and large swaths of it haven't been documented yet. Even larger sections haven't even been created yet. It is built on top of the [[Pleiades Supercompute Fabric]].
=== Pleiades Supercompute Fabric ===
The [[Pleiades Supercompute Fabric]], or just Pleiades, is a lightning-fast key-value store, written in Rust. It's an exercise in high-performance data storage at scale, on the order of exabytes.
== Getting started ==
Here you can learn more about the project, the vision, and some other key information about the overall project.
* Hang out with & get help from the community on [https://discord.gg/vU3tYqCAPH Discord]
* [[Nova Engine|What is the Nova Engine?]]
* [[Contributing|Development Environment]]
* [[Constellation Mesh|What is a constellation mesh?]]
8eaf72a8846a6dae73ce3df2144adfd17efbc84c
507
500
2023-11-05T05:07:54Z
Sienna
2
wikitext
text/x-wiki
== Welcome! ==
This is the community-driven documentation wiki for Nova, Pleiades and their supporting code bases 🎉
=== Nova Engine ===
The [[Nova Engine]] is a bleeding edge R&D effort to create a next generation distributed simulation engine (re: game engine). It is still heavily under construction as of October 2023, and large swaths of it haven't been documented yet. Even larger sections haven't even been created yet. It is built on top of the [[Pleiades Supercompute Fabric]].
=== Pleiades Supercompute Fabric ===
The [[Pleiades Supercompute Fabric]], or just Pleiades, is a lightning-fast key-value store, written in Rust. It's an exercise in high-performance data storage at scale, on the order of exabytes.
== Getting started ==
Here you can learn more about the project, the vision, and some other key information about the overall project.
* Hang out with & get help from the community on [https://discord.gg/vU3tYqCAPH Discord]
* [[Nova Engine|What is the Nova Engine?]]
* [[Contributing|Development Environment]]
* [[Constellation Mesh|What is a constellation mesh?]]
* [[Code of Conduct|Code of Conduct]]
9db018d6772fc6d4dd2bf17c2f2c4db8488dec0a
Pleiades Supercompute Fabric
0
2
3
2023-11-05T00:43:51Z
Sienna
2
init
wikitext
text/x-wiki
The Pleiades Supercompute Fabric is a next-generation supercomputer designed for massive simulation scales.
== Vision ==
The core vision for Pleiades is a globally distributed runtime focusing on low-latency data operations, exabyte-scale data storage, and straightforward multi-modal interface.
As the software industry moves forward, there is a massive hole that large systemic enterprises, financial institutions, government agencies, large scientific or educational institutions, and other massive, complex organizations have: data management at scale. Web2 and its related technologies are powerful and reliable, but they're oftentimes not up to par for the needs of systemically critical systems. Extensions exist and abound for many solutions, but ultimately, even empires get replaced by mausoleums. Web3 is capable, but many of the technologies and directions Web3 is taking will never serve the real world in meaningful ways.
Pleiades is not better, worse, or ultimately comparable to a lot of systems which exist, and this is by design. While the goal is to have Pleiades be a semi-easy drop-in migration from some existing solutions, it is nothing like most existing solutions. Pleiades makes trade-offs between a simple user experience and a complex internal architecture. While Pleiades must be easy to use for end-users (operators are also end users), it can't sacrifice end-user simplicity for functional capability.
At its core, Pleiades is informed by a lot of different solutions. This is a non-exhaustive list, in no particular order, of the solutions which have most informed the core design patterns:
* TiKV<ref name="tikv">[https://github.com/tikv/tikv TiKV], TiKV Authors</ref>
* CockroachDB<ref name="cockroachdb">[https://github.com/cockroachdb/cockroach CockroachDB], Cockroach Labs</ref>
* MongoDB<ref name="mongodb">[https://github.com/mongodb/mongo MongoDB], MongoDB, Inc.</ref>
* RocksDB<ref name="rocksdb">[https://github.com/facebook/rocksdb/ RocksDB], Facebook</ref>
* PostgreSQL<ref name="psql">[https://www.postgresql.org/ PostgreSQL], The PostgreSQL Global Development Group</ref>
* Trino<ref name="trino">[https://github.com/trinodb/trino Trino], Trino Software Foundation</ref>
* Redis<ref name="redis">[https://redis.io/ Redis], Redis Ltd.</ref>
* Azure CosmosDB<ref name="cosmosdb">[https://azure.microsoft.com/en-us/products/cosmos-db/ CosmosDB], Microsoft</ref>
* Google Spanner<ref name="spanner">[https://cloud.google.com/spanner Cloud Spanner], Google, Inc.</ref>
* Neo4J<ref name="neo4j">[https://neo4j.com/ Neo4j], Neo4j, Inc.</ref>
* Ethereum<ref name="eth">[https://ethereum.org/ Ethereum], Ethereum Foundation</ref>
* LibP2P<ref name="libp2p">[https://libp2p.io/ libp2p], Protocol Labs</ref>
* and others
Each of these different solutions provides different bits and pieces of research and design insight that ultimately informs the types of things that makes Pleiades, well, Pleiades. For example, Pleiades should have change streaming, similar to MongoDB, but it also needs to scale like Spanner while having the same performance characteristics of TiKV and enabling self-scaling like CockroachDB.
Part of this vision also means keeping it accessible to all organizations. You can use Pleiades for anything you want, so long as you're not selling it. Contributions from consumers aren't mandatory, but a rising tide raises all ships.
== Contributing ==
Check out the [[Contributing|contributions page]] for more information on setting up your development environment, and what the process is for getting your changes into mainline.
== Ecosystem Contributions ==
Giving back to the community is important. Here are contributions the Pleiades contributors have made to various ecosystems!
* Extending <code>btree_map</code> support within [https://github.com/neoeinstein/protoc-gen-prost/pull/79 protoc-gen-prost-serde]
=== Notes ===
<references />
02679b8fa6d8b220b769e57e10d5b920c7fa4eeb
4
3
2023-11-05T00:48:19Z
Sienna
2
wikitext
text/x-wiki
The Pleiades Supercompute Fabric is a next-generation supercomputer designed for massive simulation scales.
== Vision ==
The core vision for Pleiades is a globally distributed runtime focusing on low-latency data operations, exabyte-scale data storage, and straightforward multi-modal interface.
As the software industry moves forward, there is a massive hole that large systemic enterprises, financial institutions, government agencies, large scientific or educational institutions, and other massive, complex organizations have: data management at scale. Web2 and its related technologies are powerful and reliable, but they're oftentimes not up to par for the needs of systemically critical systems. Extensions exist and abound for many solutions, but ultimately, even empires get replaced by mausoleums. Web3 is capable, but many of the technologies and directions Web3 is taking will never serve the real world in meaningful ways.
Pleiades is not better, worse, or ultimately comparable to a lot of systems which exist, and this is by design. While the goal is to have Pleiades be a semi-easy drop-in migration from some existing solutions, it is nothing like most existing solutions. Pleiades makes trade-offs between a simple user experience and a complex internal architecture. While Pleiades must be easy to use for end-users (operators are also end users), it can't sacrifice end-user simplicity for functional capability.
At its core, Pleiades is informed by a lot of different solutions. This is a non-exhaustive list, in no particular order, of the solutions which have most informed the core design patterns:
* TiKV<ref name="tikv">[https://github.com/tikv/tikv TiKV], TiKV Authors</ref>
* CockroachDB<ref name="cockroachdb">[https://github.com/cockroachdb/cockroach CockroachDB], Cockroach Labs</ref>
* MongoDB<ref name="mongodb">[https://github.com/mongodb/mongo MongoDB], MongoDB, Inc.</ref>
* RocksDB<ref name="rocksdb">[https://github.com/facebook/rocksdb/ RocksDB], Facebook</ref>
* PostgreSQL<ref name="psql">[https://www.postgresql.org/ PostgreSQL], The PostgreSQL Global Development Group</ref>
* Trino<ref name="trino">[https://github.com/trinodb/trino Trino], Trino Software Foundation</ref>
* Redis<ref name="redis">[https://redis.io/ Redis], Redis Ltd.</ref>
* Azure CosmosDB<ref name="cosmosdb">[https://azure.microsoft.com/en-us/products/cosmos-db/ CosmosDB], Microsoft</ref>
* Google Spanner<ref name="spanner">[https://cloud.google.com/spanner Cloud Spanner], Google, Inc.</ref>
* Neo4J<ref name="neo4j">[https://neo4j.com/ Neo4j], Neo4j, Inc.</ref>
* Ethereum<ref name="eth">[https://ethereum.org/ Ethereum], Ethereum Foundation</ref>
* LibP2P<ref name="libp2p">[https://libp2p.io/ libp2p], Protocol Labs</ref>
* and others
Each of these different solutions provides different bits and pieces of research and design insight that ultimately informs the types of things that makes Pleiades, well, Pleiades. For example, Pleiades should have change streaming, similar to MongoDB, but it also needs to scale like Spanner<ref name="spanner" /> while having the same performance characteristics of TiKV<ref name="tikv" /> and enabling self-scaling like CockroachDB<ref name="cockroachdb" />.
Part of this vision also means keeping it accessible to all organizations. You can use Pleiades for anything you want, so long as you're not selling it. Contributions from consumers aren't mandatory, but a rising tide raises all ships.
== Architecture ==
You can find more information about the internal Pleiades architecture on the [[Pleiades Architecture|architecture page]].
== Contributing ==
Check out the [[Contributing|contributions page]] for more information on setting up your development environment, and what the process is for getting your changes into mainline.
== Ecosystem Contributions ==
Giving back to the community is important. Here are some contributions the Pleiades contributors have made to various ecosystems!
* Extending <code>btree_map</code> support within [https://github.com/neoeinstein/protoc-gen-prost/pull/79 protoc-gen-prost-serde] for better protocol buffer serialization
=== Notes ===
<references />
536b47d13d52cc6ccc18a33db5711029479bff19
5
4
2023-11-05T00:48:56Z
Sienna
2
wikitext
text/x-wiki
The Pleiades Supercompute Fabric is a next-generation supercomputer designed for massive simulation environments.
== Vision ==
The core vision for Pleiades is a globally distributed runtime focusing on low-latency data operations, exabyte-scale data storage, and straightforward multi-modal interface.
As the software industry moves forward, there is a massive hole that large systemic enterprises, financial institutions, government agencies, large scientific or educational institutions, and other massive, complex organizations have: data management at scale. Web2 and its related technologies are powerful and reliable, but they're oftentimes not up to par for the needs of systemically critical systems. Extensions exist and abound for many solutions, but ultimately, even empires get replaced by mausoleums. Web3 is capable, but many of the technologies and directions Web3 is taking will never serve the real world in meaningful ways.
Pleiades is not better, worse, or ultimately comparable to a lot of systems which exist, and this is by design. While the goal is to have Pleiades be a semi-easy drop-in migration from some existing solutions, it is nothing like most existing solutions. Pleiades makes trade-offs between a simple user experience and a complex internal architecture. While Pleiades must be easy to use for end-users (operators are also end users), it can't sacrifice end-user simplicity for functional capability.
At its core, Pleiades is informed by a lot of different solutions. This is a non-exhaustive list, in no particular order, of the solutions which have most informed the core design patterns:
* TiKV<ref name="tikv">[https://github.com/tikv/tikv TiKV], TiKV Authors</ref>
* CockroachDB<ref name="cockroachdb">[https://github.com/cockroachdb/cockroach CockroachDB], Cockroach Labs</ref>
* MongoDB<ref name="mongodb">[https://github.com/mongodb/mongo MongoDB], MongoDB, Inc.</ref>
* RocksDB<ref name="rocksdb">[https://github.com/facebook/rocksdb/ RocksDB], Facebook</ref>
* PostgreSQL<ref name="psql">[https://www.postgresql.org/ PostgreSQL], The PostgreSQL Global Development Group</ref>
* Trino<ref name="trino">[https://github.com/trinodb/trino Trino], Trino Software Foundation</ref>
* Redis<ref name="redis">[https://redis.io/ Redis], Redis Ltd.</ref>
* Azure CosmosDB<ref name="cosmosdb">[https://azure.microsoft.com/en-us/products/cosmos-db/ CosmosDB], Microsoft</ref>
* Google Spanner<ref name="spanner">[https://cloud.google.com/spanner Cloud Spanner], Google, Inc.</ref>
* Neo4J<ref name="neo4j">[https://neo4j.com/ Neo4j], Neo4j, Inc.</ref>
* Ethereum<ref name="eth">[https://ethereum.org/ Ethereum], Ethereum Foundation</ref>
* LibP2P<ref name="libp2p">[https://libp2p.io/ libp2p], Protocol Labs</ref>
* and others
Each of these different solutions provides different bits and pieces of research and design insight that ultimately informs the types of things that makes Pleiades, well, Pleiades. For example, Pleiades should have change streaming, similar to MongoDB, but it also needs to scale like Spanner<ref name="spanner" /> while having the same performance characteristics of TiKV<ref name="tikv" /> and enabling self-scaling like CockroachDB<ref name="cockroachdb" />.
Part of this vision also means keeping it accessible to all organizations. You can use Pleiades for anything you want, so long as you're not selling it. Contributions from consumers aren't mandatory, but a rising tide raises all ships.
== Architecture ==
You can find more information about the internal Pleiades architecture on the [[Pleiades Architecture|architecture page]].
== Contributing ==
Check out the [[Contributing|contributions page]] for more information on setting up your development environment, and what the process is for getting your changes into mainline.
== Ecosystem Contributions ==
Giving back to the community is important. Here are some contributions the Pleiades contributors have made to various ecosystems!
* Extending <code>btree_map</code> support within [https://github.com/neoeinstein/protoc-gen-prost/pull/79 protoc-gen-prost-serde] for better protocol buffer serialization
=== Notes ===
<references />
fa2d57275c64e8066f9abae4e3d7ec936f881cd4
512
5
2023-11-19T01:01:20Z
Sienna
2
/* Architecture */ updated link to disk layout
wikitext
text/x-wiki
The Pleiades Supercompute Fabric is a next-generation supercomputer designed for massive simulation environments.
== Vision ==
The core vision for Pleiades is a globally distributed runtime focusing on low-latency data operations, exabyte-scale data storage, and straightforward multi-modal interface.
As the software industry moves forward, there is a massive hole that large systemic enterprises, financial institutions, government agencies, large scientific or educational institutions, and other massive, complex organizations have: data management at scale. Web2 and its related technologies are powerful and reliable, but they're oftentimes not up to par for the needs of systemically critical systems. Extensions exist and abound for many solutions, but ultimately, even empires get replaced by mausoleums. Web3 is capable, but many of the technologies and directions Web3 is taking will never serve the real world in meaningful ways.
Pleiades is not better, worse, or ultimately comparable to a lot of systems which exist, and this is by design. While the goal is to have Pleiades be a semi-easy drop-in migration from some existing solutions, it is nothing like most existing solutions. Pleiades makes trade-offs between a simple user experience and a complex internal architecture. While Pleiades must be easy to use for end-users (operators are also end users), it can't sacrifice end-user simplicity for functional capability.
At its core, Pleiades is informed by a lot of different solutions. This is a non-exhaustive list, in no particular order, of the solutions which have most informed the core design patterns:
* TiKV<ref name="tikv">[https://github.com/tikv/tikv TiKV], TiKV Authors</ref>
* CockroachDB<ref name="cockroachdb">[https://github.com/cockroachdb/cockroach CockroachDB], Cockroach Labs</ref>
* MongoDB<ref name="mongodb">[https://github.com/mongodb/mongo MongoDB], MongoDB, Inc.</ref>
* RocksDB<ref name="rocksdb">[https://github.com/facebook/rocksdb/ RocksDB], Facebook</ref>
* PostgreSQL<ref name="psql">[https://www.postgresql.org/ PostgreSQL], The PostgreSQL Global Development Group</ref>
* Trino<ref name="trino">[https://github.com/trinodb/trino Trino], Trino Software Foundation</ref>
* Redis<ref name="redis">[https://redis.io/ Redis], Redis Ltd.</ref>
* Azure CosmosDB<ref name="cosmosdb">[https://azure.microsoft.com/en-us/products/cosmos-db/ CosmosDB], Microsoft</ref>
* Google Spanner<ref name="spanner">[https://cloud.google.com/spanner Cloud Spanner], Google, Inc.</ref>
* Neo4J<ref name="neo4j">[https://neo4j.com/ Neo4j], Neo4j, Inc.</ref>
* Ethereum<ref name="eth">[https://ethereum.org/ Ethereum], Ethereum Foundation</ref>
* LibP2P<ref name="libp2p">[https://libp2p.io/ libp2p], Protocol Labs</ref>
* and others
Each of these different solutions provides different bits and pieces of research and design insight that ultimately informs the types of things that makes Pleiades, well, Pleiades. For example, Pleiades should have change streaming, similar to MongoDB, but it also needs to scale like Spanner<ref name="spanner" /> while having the same performance characteristics of TiKV<ref name="tikv" /> and enabling self-scaling like CockroachDB<ref name="cockroachdb" />.
Part of this vision also means keeping it accessible to all organizations. You can use Pleiades for anything you want, so long as you're not selling it. Contributions from consumers aren't mandatory, but a rising tide raises all ships.
== Architecture ==
You can find more information about the internal Pleiades architecture on the [[Pleiades Architecture|architecture page]].
If you're curious about the disk layout, you can find that [[Disk layout|here!]].
== Contributing ==
Check out the [[Contributing|contributions page]] for more information on setting up your development environment, and what the process is for getting your changes into mainline.
== Ecosystem Contributions ==
Giving back to the community is important. Here are some contributions the Pleiades contributors have made to various ecosystems!
* Extending <code>btree_map</code> support within [https://github.com/neoeinstein/protoc-gen-prost/pull/79 protoc-gen-prost-serde] for better protocol buffer serialization
=== Notes ===
<references />
7d743ec45a7c86c460d234d3bdbf0e73785d627c
Pleiades Architecture
0
3
6
2023-11-05T01:03:27Z
Sienna
2
Created page with "== Overview == Pleiades is grouped into a few different classifications: * Components * Aspects * Services Each of these classifications provides different bits of functionality. Components are self-contained units of functionality that can be reused across different parts of Pleiades. They can be thought of as building blocks that can be combined with other components to create larger, more complex systems. Aspects, on the other hand, are cross-cutting features or fu..."
wikitext
text/x-wiki
== Overview ==
Pleiades is grouped into a few different classifications:
* Components
* Aspects
* Services
Each of these classifications provides different bits of functionality. Components are self-contained units of functionality that can be reused across different parts of Pleiades. They can be thought of as building blocks that can be combined with other components to create larger, more complex systems. Aspects, on the other hand, are cross-cutting features or functionality that affect multiple components or modules.
Of the three classifications, things are either runtime-centric or library-centric. Runtime-centric pieces are focused on managing the state of a Pleiades node (or larger constellation), but library-centric pieces only provide reusable functionality. For the most part, Pleiades aims to keep the runtime code fairly light with a focal point on event-driven wrappers of library functionality.
== v3 ==
Here are the major components, aspects, and services that are a part of the v3 architecture.
=== Components ===
* HLC
* Raft Engine
* ZeroMQ
* RocksDB
=== Aspects ===
* Storage Engine
* Netcode & RPC framework
* Messaging substrate
=== Services ===
* Gossip
* kvstore
* Raft
* Messaging
* System
== v2 ==
While no longer in use, these are some of the designs of the v2 architecture.
=== KV Store Runtime Architecture ===
This is the general architecture of the KVStore, which is the core monolithic key-value architecture that underpins Pleiades.
==== Interface ====
The most up-to-date interfaces can be found in the code base, but here is the general interface structure. It includes two major features: transactions and kv operations. Transactions are atomic (by design), and are currently implicit due to the disk-based state machine implementation with bbolt.
type ITransactionManager interface {
CloseTransaction(ctx context.Context, transaction *kvpb.Transaction) error
Commit(ctx context.Context, transaction *kvpb.Transaction) *kvpb.Transaction
GetNoOpTransaction(shardId uint64) *kvpb.Transaction
GetTransaction(ctx context.Context, shardId uint64) (*kvpb.Transaction, error)
SessionFromClientId(clientId uint64) (*dclient.Session, bool)
}
type IKVStore interface {
CreateAccount(request *kvpb.CreateAccountRequest) (*kvpb.CreateAccountResponse, error)
DeleteAccount(request *kvpb.DeleteAccountRequest) (*kvpb.DeleteAccountResponse, error)
CreateBucket(request *kvpb.CreateBucketRequest) (*kvpb.CreateBucketResponse, error)
DeleteBucket(request *kvpb.DeleteBucketRequest) (*kvpb.DeleteBucketResponse, error)
GetKey(request *kvpb.GetKeyRequest) (*kvpb.GetKeyResponse, error)
PutKey(request *kvpb.PutKeyRequest) (*kvpb.PutKeyResponse, error)
DeleteKey(request *kvpb.DeleteKeyRequest) (*kvpb.DeleteKeyResponse, error)
}
This isn't a perfectly ideal interface, but it's a good starting point. Once the transactions interface is properly implemented, it will be more effective.
==== Hot Path ====
This is a high-level architecture of the hot path for kv operations. Kv operations ''must'' be as performant as possible as every higher-order use case will be built on the kvstore and be wholly reliant on it's performance.
{{Note|'''Note:''' ''This diagram assumes a raft shard has been properly provisioned and is known in advance. Check out [shard lifecycles](lifecycles-v2.md#Shards) for more information''}}
%%{init: { 'logLevel': 'debug', 'theme': 'base' }%%
graph
kvEvent -- 1 --> server.kvStoreHandler -- 2 --> server.raftBboltManager
server.raftBboltManager -- 3 --> server.transactionManager -- 4 --> runtime.ITransactionManager
runtime.ITransactionManager --> dragonboat.nodeHost -. 5 .-> server.transactionManager -. 6 .-> server.raftBboltManager
server.raftBboltManager -- 7 --> nodeHost.SyncPropose
nodeHost.SyncPropose -- 7 --> nodeHost.IStateMachine -- 7 --> nodeHost.IOnDiskStateMachine
nodeHost.IOnDiskStateMachine ---> kv.BBoltStateMachine
kv.BBoltStateMachine -- 10 --> fsm.bboltStore -. 11 .-> kv.BBoltStateMachine
kv.BBoltStateMachine -. 12 .-> nodeHost.IStateMachine -. 13 .-> nodeHost.SyncPropose -. 14 .-> server.raftBboltManager
server.raftBboltManager -. 15 .-> server.transactionManager
server.raftBboltManager -. 16 .-> server.kvStoreHandler -. 17 .-> kvEvent
# A `kvEvent` occurs. Currently these happen through the Connect<ref name="connect">[https://connectrpc.com/ Connect], Buf</ref> functionality using protobufs. Here are the current interface options that could happen
# Once the `kvEvent` occurs, it's handled by the `kvStoreHandler` in the `server` package. This layer is currently just a wrapper so it exists.
# A local transaction is requested, but is currently not implemented.
# The interface is called, but is backed by `dragonboat.NodeHost`
# The transaction, if applicable, is bubbled up and stored in the transaction manager's local cache.
# The cached version is returned to the `raftBboltStoreManager`, which is just a packaging wrapper
# The event is sent through the synchronous interface of `dragonboat.NodeHost.SyncPropose`, and it's repackaged as a byte array. While mildly inconvenient, it's useful as we can pass anything through this layer with minimal concerns.
3532f6b45503bf0bc21015dba356ab88f0720074
501
6
2023-11-05T04:17:31Z
Sienna
2
wikitext
text/x-wiki
== Overview ==
Pleiades is grouped into a few different classifications:
* Components
* Aspects
* Services
Each of these classifications provides different bits of functionality. Components are self-contained units of functionality that can be reused across different parts of Pleiades. They can be thought of as building blocks that can be combined with other components to create larger, more complex systems. Aspects, on the other hand, are cross-cutting features or functionality that affect multiple components or modules.
Of the three classifications, things are either runtime-centric or library-centric. Runtime-centric pieces are focused on managing the state of a Pleiades node (or larger constellation), but library-centric pieces only provide reusable functionality. For the most part, Pleiades aims to keep the runtime code fairly light with a focal point on event-driven wrappers of library functionality.
== v3 Proposal ==
Here are the major components, aspects, and services that are a part of the v3 architecture.
=== Components ===
* HLC
* Raft Engine
* ZeroMQ
* RocksDB
=== Aspects ===
* Storage Engine
* Netcode & RPC framework
* Messaging substrate
=== Services ===
* Gossip
* kvstore
* Raft
* Messaging
* System
Pleiades is a collection of different types, layers, and aspects of technologies that enable it to operate successfully. Right now, Pleiades is at v2 - it's already gone several early rewrites after validating assumptions, design patterns, etc. This doc describes technologies that are being targeted for the v3 rewrite.
The technologies listed below are grouped by category but not necessarily any specific order.
=== Programming Language ===
Pleiades v1 and v2 both use Go. Go is a very powerful language and allowed Pleiades to go through many quick iterations of technology. It's concurrency model made it easy to design Pleiades' monolithic but modular architecture, and enabled high-throughput on most workloads. However, Go has also been exceedingly limiting due to its memory management model.
The value of Go for Pleiades was [[User:Sienna|Sienna's]] familiarity with the ecosystem, a large and diverse CNCF ecosystem to pull libraries from, well-respected CNCF vendors with excellent reference libraries, and a vibrant community. However, Go's memory management model has been extremely limiting when it comes to performance. Due to Go's GC and automatic memory management, it's incredibly difficult to determine where objects are allocated, what their lifecycle is, and nil pointer dereferences are difficult to debug in a massive monolith. Go's parametric generics are simply too basic and don't allow for covariance, and can't realistically be used effectively at scale. Go also uses Plan9 assembly, which is impossible to write due to a near complete lack of documentation. Go is useful for infrastructure applications, but not low-level infrastructure.
Pleiades v3 is targeting a complete rewrite in Rust. Rust's memory management, lifetimes, generics, and general typing system are substantially stronger than Go's, and it supports intrinsics through LLVM. Rust's memory management model guarantees faster performance due to memory ownership and lifetimes, and its threading model is much more robust than Go's goroutines. This does create an extensive overhead as large swaths of the existing Pleiades v2 code base comes from 3rd parties, and several of the more important subsystems will need to be either completely rewritten from scratch, ported, or alternatives found. Overall, Rust's performance characteristics, memory management, and LLVM integration make it a much more suitable language for Pleiades v3.
=== Networking ===
The core networking stack of Pleiades v3 will be based off QUIC as it provides faster connection times and zero-blocking streams. Pleiades v2 is currently a mixture of gRPC and varying TCP implementations.
==== QUIC ====
QUIC is the underlying networking technology. It's based off Google's SPDY protocol, and was ratified by IETF with [https://www.rfc-editor.org/rfc/rfc9000.html RFC 9000]. QUIC provides 0-RTT handshakes, multiple streams per connection, full TLS connection security, ordered bidi streaming, passive latency monitoring, and connection migrations. It is incredibly performant and is the underlying technology of HTTP/3.
The reference implementation that Pleiades will likely use for Pleiades v3 is [https://github.com/cloudflare/quiche quiche]. Cloudflare's networking backbone is some of the best in the industry, and their Rust implementation is the most used. For more information on QUIC, see Cloudflare's [https://cloudflare-quic.com/ landing page].
=== RPC & Messaging ===
While gRPC provides useful RPC functionality, it is HTTP-based, and being at the mercy of the ecosystem is miserable. Pleiades v2 uses it and it is a major lesson-learned for the project. gRPC is incredibly slow, and the ecosystem is driven by ''largest consumer needs'', so HTTP/3 won't come for years, and even then it's still HTTP. Pleiades v2 also embeds [https://nats.io NATS].
Pleiades v3 will no longer use gRPC beyond bootstrapping, but two variations of QUIC. It's expected that there will be two layers of RPC-style networking: one for the very low-level raft, gossip, and kvstore subsystems; and the other for higher-order application functionality such as messaging, queuing, and pubsub. Being at the mercy of the gRPC ecosystem is hellish, miserable, and generally a great way to let your technology rot. That being said, protocol buffers are very powerful, and useful for encoding and framing, so those will stay.
==== RPC ====
At the lowest level, Pleiades v3 will implement a custom QUIC-based protocol with protocol buffers using magic bytes and protobuf framing architectures to determine framing, routing, and message passing. While QUIC supports ordered bidi streaming on a per-stream basis, due to short-term complexity, its likely that each stream handler will maintain two streams per protocol type for ease of simplicity. The use of magic bytes is primarily to annotate and notify changes in configurations, routing, etc., and will be limited as protobufs are fully framed. There is an open proposal for [[rtRPC|rtRPC]] that would support better message passing between services.
==== Messaging ====
For higher-order application messaging needs, Pleiades v2 currently uses embedded NATS as the queuing and pubsub messaging provider. NATS is incredibly powerful, incredibly heavy, and also written in Go. Going outside of Go, the only major message queuing platform that seems to be a good fit is [https://zeromq.org/ ZeroMQ]. zmq is a very powerful solution in C++, and there are several Rust bindings for it, and one [https://github.com/zeromq/zmq.rs full-Rust implementation]. The only limitation of zmq (bindings or native) is that currently there are no QUIC socket implementations.
=== Network Interfacing ===
Pleiades v3 will likely define a standard set of network traits that each library can implement to leverage the networking library in an RPC-adjacent manner. This is dependent on Rust's memory model, and whether or not it's a good design pattern.
For the protocol buffer implementation, right now [https://crates.io/crates/quick-protobuf quick-protobuf] is attractive because it's low-level and uses clone-on-write. It also doesn't require `protoc` or other external tools, which is extremely attractive.
=== Clustering & Automation ===
Pleiades v2 uses powerful libraries from well-respected tech companies to manage clustering, membership, and other varying things for it's autonomy. However, nearly all these technologies are in Go and must be ported with non-trivial modifications, mostly to networking.
=== Gossip ===
Pleiades v2 targeted [https://github.com/hashicorp/serf Serf] for it's internal gossiping structure. The value of Serf was it's mixture of [https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/SWIM.pdf SWIM] and the [https://sites.cs.ucsb.edu/~ravenben/classes/276/papers/vivaldi-sigcomm04.pdf Vivaldi network tomography] system. However, Serf is written in Go and by Hashicorp, who are relicensing all of their new versions and products with BSL moving foward. So something has to be done.
The value of SWIM + network tomography is it's infectious gossiping, clustering, and location awareness. This allows Pleiades to loosely model a force-directed graph, where changes are rolled out through loosely connected nodes, and the network tomography allows for location-based clustering without needing to define locations. Pleiades v3 will keep SWIM the Vivaldi network tomography functionality for it's gossiping patterns. This is an essential architecture piece which allows Pleiades v3 to remain an automated constellation mesh database (re: fully-autonomous system). For the curious, Hashicorp published a [https://www.serf.io/docs/internals/simulator.html configurable convergence simulator] for SWIM; it's worth a look-see if you have performance questions.
However, there is a substantial bit of work which needs to be done to port Hashicorp's reference implementation and extensions. Hashicorp's [https://arxiv.org/abs/1707.00788 Lifeguard extensions] to SWIM is relatively minor, but the Vivaldi implementation contains several extensions from the [https://www.usenix.org/legacy/events/nsdi07/tech/full_papers/ledlie/ledlie_html/index_save.html Network Coordinates in the Wild] USENIX paper and IBM Research's [https://dominoweb.draco.res.ibm.com/492D147FCCEA752C8525768F00535D8A.html Euclidean Embedding] paper. While these extensions can be trusted due to long-term production usage, they also make it harder when referencing the papers.
The reference SWIM implementation (with Lifeguard) is called [https://github.com/hashicorp/memberlist memberlist]. memberlist bundles it's own networking and custom packet implementation for the protocol, but the Pleiades v3 port can't have either. The network messages need to be protobufs, and the networking implementation will need to support the QUIC-based stack that Pleiades will use. Otherwise, the rest of the functionality should be a relatively straightforward port using Rust's stdlib.
The Vivaldi implementation sits in the Serf [https://github.com/hashicorp/serf/tree/master/coordinate tree], and is fairly straightforward. Most of the structs can be converted to protobufs for external interfacing, but otherwise this library is fairly straightforward.
=== Clocks (Under Construction) ===
nb (sienna): this changes with the advent of HUYGENS from Google in 2018.
Pleiades v1 and v2 do not implement clocks, and this is a major design flaw. During prototyping, none of the major versions ever made it far enough to need a clock. However, for ranges to work with atomic transactions, Pleiades v3 will need accurate clocks. The OS will always sync with the system clock, but Pleiades v3 will port CockroachDB's [hybrid logical clock (HLC) implementation](https://github.com/cockroachdb/cockroach/blob/master/pkg/util/hlc/doc.go) as it's implementation. Lamport clocks have too much skew for a tightly-knit system like Pleiades, and CockroachDB's HLC implementation is stable in production. This port should be very straightforward.
## Lifecycle Automation
Pleiades v1 contained no dependency injection, but Pleiades v2 uses Uber's [fx](https://github.com/uber-go/fx) framework. This gives a baseline DI framework that's good enough to handle startup and shutdown events. However, fx is overcomplicated and not worth porting, so Pleiades v3 can use whatever is most popular in the Rust ecosystem. DI is imperative as Pleiades is a modular monolith, and there needs to be both control of background services and also DI for things like service-to-service clients, network handlers, etc.
Pleiades v1 and v2 did not contain lifecycle workflows, but Pleiades v3 will contain a small workflow engine to keep the internal lifecycle events manageable. As each node is fully autonomous in the constellation, the ability to handle complex workflows in for CREs is important. This will help the project maintainers advance the constellation's internals without completely rewriting large swaths of internal logic every time there's a logic change to a CRE workflow. Daniel Gerlag's [Workflow Core](https://github.com/danielgerlag/workflow-core) is an excellent embeddable workflow engine in C# and is the reference implementation for Pleiades v3 workflow engine. Not all features or functionality will be needed, so the port will primarily be just the workflow engine and enough netcode to keep it controllable and observable.
## Config Automation
Pleiades v1 used Steve Francia's [viper](https://github.com/spf13/viper) and [cobra](https://github.com/spf13/cobra) libraries, whereas Pleiades v2 only used viper and a custom port of Mitchell Hashimoto's [CLI library](https://github.com/mitchellh/cli). Viper is a very powerful but overcomplicated configuration management library that provides the core configuration, but it is like trying to use a shotgun on work that needs a scalpel. For the CLI and configuration library, whatever is both popular for Rust and also minimal will likely be the correct decisions to integrate. Ideally, these would be external dependencies instead of ported code.
# Storage
Storage in Pleiades v1 was an absolute mess (hey, it was a prototype lol), and Pleiades v2's storage layer is better, but still not great. Pleiades v3 aims to fix that by having a single, unified storage layer that provides local, shard, and global storage opportunities to all consumers (with caveats). Both Pleiades v1 and v2 are built on Raft, but Pleiades v2 uses multi-raft.
## Disk Storage
Pleiades v1 and v2 both use [bbolt](https://github.com/etcd-io/bbolt), which is a Go-based port of Howard Chu's LMDB. Bbolt is a great embedded database for some workloads, but it is not enough for Pleiades. Bbolt is based on b+tree indexing, which is useful for lightning fast reads, but horrible at writes. To support more complex workloads, Pleiades v3 needs to use a Log-Structured-Merge-Tree (LSM) database, where reads and writes are a bit more balanced.
The initial target replacement for bbolt was CockroachDB's [Pebble](https://github.com/cockroachdb/pebble/blob/master/docs/rocksdb.md), which is a [customized port](https://github.com/cockroachdb/pebble/blob/master/docs/rocksdb.md) of Facebook's [RocksDB](https://github.com/facebook/rocksdb). RocksDB is based off Google's [LevelDB](https://github.com/google/leveldb). However, as Pleiades v3 is no longer targeting Go, Pebble is no longer a good fit for use. TiKV's storage engine uses RocksDB, which is mind-blowingly optimized for modern storage hardware, and Pleiades v3 will likely use TiKV's [RocksDB bindings](https://github.com/tikv/rust-rocksdb). As all databases, at their lowest levels, are just disk-based hash maps, using RocksDB is totally normal and saves a bunch of effort.
Pleiades v3 will use a single instance of RocksDB per node to store local, shard, and global data.
## Raft
Pleiades v1 used Hashicorp's raft implementation and Pleiades v2 uses a multi-raft library called [dragonboat](https://github.com/lni/dragonboat). Dragonboat is _impressively performant_, but it's in Go and by a very discreet maintainer who uses the handle `lni`. With Pleiades v3 being Rust-based, there are two real options for Rust-based raft: port dragonboat (not ideal) or fork & modify TiKV's [raft-rs](https://github.com/tikv/raft-rs) (not ideal). Realistically, those are the two options, and neither are ideal. Porting dragonboat will be heavily error prone because it contains **extensive** Go-specific performance modifications, and that's not really ideal. However, raft-rs uses the [prost](https://github.com/tokio-rs/prost) protobuf library, which is heavy and slow compared to quick-protobuf, and it includes it's own networking.
Pleiades v2's raft architecture was heavily influenced by dragonboat, and dragonboat has more bells and whistles than raft-rs. TiKV's internal multi-raft implementation, [raftstore-v2](https://github.com/tikv/tikv/tree/master/components/raftstore-v2), is complex and built on top of raft-rs using TiKV's [placement drivers](https://github.com/tikv/pd) and [regions](https://tikv.github.io/tikv-dev-guide/understanding-tikv/scalability/region.html) (their version of CockroachDB's range keys). Pleiades v3 can't really consume raftstore-v2 as-is because we don't support regions, but range keys, and our multi-raft architectures are _wildly different_.
Sienna's assertion is that we should fork & modify raft-rs to implement our networking changes and use the fork for now. Ideally, we'll submit the modifications back to raft-rs, but due to our custom network protocol, it's unlikely the modification will be welcome. So long as our networking modifications are minimal and isolated, it should be fairly easy to pull in patches from upstream raft-rs as needed. As raft is nearly 10 years old now, it's unlikely to go through major changes, so this is a fairly safe decision, it just comes with extensive maintenance burdens.
## Ranges (re: sharding)
Pleiades v1 used no sharding (but also didn't use multi-raft), and Pleiades v2 uses various hashing algorithms for sharding. Architecturally, Pleiades v3 will leverage CockroachDB's range key architecture in conjunction with it's gossip fabric. As this required a full rewrite regardless of the Rust migration, there are minimal changes here.
## Transactions & MVCC
Pleiades v2 contained atomic transaction support, but Pleiades v3 does not currently have any transaction support planned. It is possible that v3 will contain atomic transaction support, but it is not guaranteed. RocksDB does contain pessimistic and optimistic transaction support, so it is possible that atomic transactions will continue to exist in Pleiades v3.
Pleiades v2 contained support for atomic MVCC operations through bbolt, but Pleiades v3 will not. RocksDB contains WAL and two-phase commit functionality, which allows for similar operations, but is not quite the same. Continued MVCC support is unplanned for Pleiades v3.
# Administration
Pleiades v2 had no administration layer, and administration for Pleiades v3 was planned via direct integration with SWIM. Pleiades v2 contained several fabric CLI commands, and Pleiades v3 will contain similar constructs for the time being.
## Authentication
Neither versions of Pleiades contains authentication as they were architectural prototypes, but Pleiades v3 will lay the core foundation required for fine-grained authorization. Originally, Pleiades v3 was going to bundle [OpenFGA](https://openfga.dev) for authentication, but that has changed now that v3 will be a full bottom-up rewrite. Aside from some nice-to-have TLS functionality to make things easier to work with, authentication is going to be put on hold until at least v3.1, but possibly later.
== v2 ==
While no longer in use, these are some of the designs of the v2 architecture.
=== KV Store Runtime Architecture ===
This is the general architecture of the KVStore, which is the core monolithic key-value architecture that underpins Pleiades.
==== Interface ====
The most up-to-date interfaces can be found in the code base, but here is the general interface structure. It includes two major features: transactions and kv operations. Transactions are atomic (by design), and are currently implicit due to the disk-based state machine implementation with bbolt.
type ITransactionManager interface {
CloseTransaction(ctx context.Context, transaction *kvpb.Transaction) error
Commit(ctx context.Context, transaction *kvpb.Transaction) *kvpb.Transaction
GetNoOpTransaction(shardId uint64) *kvpb.Transaction
GetTransaction(ctx context.Context, shardId uint64) (*kvpb.Transaction, error)
SessionFromClientId(clientId uint64) (*dclient.Session, bool)
}
type IKVStore interface {
CreateAccount(request *kvpb.CreateAccountRequest) (*kvpb.CreateAccountResponse, error)
DeleteAccount(request *kvpb.DeleteAccountRequest) (*kvpb.DeleteAccountResponse, error)
CreateBucket(request *kvpb.CreateBucketRequest) (*kvpb.CreateBucketResponse, error)
DeleteBucket(request *kvpb.DeleteBucketRequest) (*kvpb.DeleteBucketResponse, error)
GetKey(request *kvpb.GetKeyRequest) (*kvpb.GetKeyResponse, error)
PutKey(request *kvpb.PutKeyRequest) (*kvpb.PutKeyResponse, error)
DeleteKey(request *kvpb.DeleteKeyRequest) (*kvpb.DeleteKeyResponse, error)
}
This isn't a perfectly ideal interface, but it's a good starting point. Once the transactions interface is properly implemented, it will be more effective.
==== Hot Path ====
This is a high-level architecture of the hot path for kv operations. Kv operations ''must'' be as performant as possible as every higher-order use case will be built on the kvstore and be wholly reliant on it's performance.
{{Note|'''Note:''' ''This diagram assumes a raft shard has been properly provisioned and is known in advance. Check out [shard lifecycles](lifecycles-v2.md#Shards) for more information''}}
%%{init: { 'logLevel': 'debug', 'theme': 'base' }%%
graph
kvEvent -- 1 --> server.kvStoreHandler -- 2 --> server.raftBboltManager
server.raftBboltManager -- 3 --> server.transactionManager -- 4 --> runtime.ITransactionManager
runtime.ITransactionManager --> dragonboat.nodeHost -. 5 .-> server.transactionManager -. 6 .-> server.raftBboltManager
server.raftBboltManager -- 7 --> nodeHost.SyncPropose
nodeHost.SyncPropose -- 7 --> nodeHost.IStateMachine -- 7 --> nodeHost.IOnDiskStateMachine
nodeHost.IOnDiskStateMachine ---> kv.BBoltStateMachine
kv.BBoltStateMachine -- 10 --> fsm.bboltStore -. 11 .-> kv.BBoltStateMachine
kv.BBoltStateMachine -. 12 .-> nodeHost.IStateMachine -. 13 .-> nodeHost.SyncPropose -. 14 .-> server.raftBboltManager
server.raftBboltManager -. 15 .-> server.transactionManager
server.raftBboltManager -. 16 .-> server.kvStoreHandler -. 17 .-> kvEvent
# A `kvEvent` occurs. Currently these happen through the Connect<ref name="connect">[https://connectrpc.com/ Connect], Buf</ref> functionality using protobufs. Here are the current interface options that could happen
# Once the `kvEvent` occurs, it's handled by the `kvStoreHandler` in the `server` package. This layer is currently just a wrapper so it exists.
# A local transaction is requested, but is currently not implemented.
# The interface is called, but is backed by `dragonboat.NodeHost`
# The transaction, if applicable, is bubbled up and stored in the transaction manager's local cache.
# The cached version is returned to the `raftBboltStoreManager`, which is just a packaging wrapper
# The event is sent through the synchronous interface of `dragonboat.NodeHost.SyncPropose`, and it's repackaged as a byte array. While mildly inconvenient, it's useful as we can pass anything through this layer with minimal concerns.
c516448b83aa438a6f7ce43d2efbf78331e1036c
502
501
2023-11-05T04:34:08Z
Sienna
2
wikitext
text/x-wiki
== Overview ==
Pleiades is grouped into a few different classifications:
* Components
* Aspects
* Services
Each of these classifications provides different bits of functionality. Components are self-contained units of functionality that can be reused across different parts of Pleiades. They can be thought of as building blocks that can be combined with other components to create larger, more complex systems. Aspects, on the other hand, are cross-cutting features or functionality that affect multiple components or modules.
Of the three classifications, things are either runtime-centric or library-centric. Runtime-centric pieces are focused on managing the state of a Pleiades node (or larger constellation), but library-centric pieces only provide reusable functionality. For the most part, Pleiades aims to keep the runtime code fairly light with a focal point on event-driven wrappers of library functionality.
== v3 Proposal ==
Here are the major components, aspects, and services that are a part of the v3 architecture.
=== Components ===
* HLC
* Raft Engine
* ZeroMQ
* RocksDB
=== Aspects ===
* Storage Engine
* Netcode & RPC framework
* Messaging substrate
=== Services ===
* Gossip
* kvstore
* Raft
* Messaging
* System
Pleiades is a collection of different types, layers, and aspects of technologies that enable it to operate successfully. Right now, Pleiades is at v2 - it's already gone several early rewrites after validating assumptions, design patterns, etc. This doc describes technologies that are being targeted for the v3 rewrite.
The technologies listed below are grouped by category but not necessarily any specific order.
=== Programming Language ===
Pleiades v1 and v2 both use Go. Go is a very powerful language and allowed Pleiades to go through many quick iterations of technology. It's concurrency model made it easy to design Pleiades' monolithic but modular architecture, and enabled high-throughput on most workloads. However, Go has also been exceedingly limiting due to its memory management model.
The value of Go for Pleiades was [[User:Sienna|Sienna's]] familiarity with the ecosystem, a large and diverse CNCF ecosystem to pull libraries from, well-respected CNCF vendors with excellent reference libraries, and a vibrant community. However, Go's memory management model has been extremely limiting when it comes to performance. Due to Go's GC and automatic memory management, it's incredibly difficult to determine where objects are allocated, what their lifecycle is, and nil pointer dereferences are difficult to debug in a massive monolith. Go's parametric generics are simply too basic and don't allow for covariance, and can't realistically be used effectively at scale. Go also uses Plan9 assembly, which is impossible to write due to a near complete lack of documentation. Go is useful for infrastructure applications, but not low-level infrastructure.
Pleiades v3 is targeting a complete rewrite in Rust. Rust's memory management, lifetimes, generics, and general typing system are substantially stronger than Go's, and it supports intrinsics through LLVM. Rust's memory management model guarantees faster performance due to memory ownership and lifetimes, and its threading model is much more robust than Go's goroutines. This does create an extensive overhead as large swaths of the existing Pleiades v2 code base comes from 3rd parties, and several of the more important subsystems will need to be either completely rewritten from scratch, ported, or alternatives found. Overall, Rust's performance characteristics, memory management, and LLVM integration make it a much more suitable language for Pleiades v3.
=== Networking ===
The core networking stack of Pleiades v3 will be based off QUIC as it provides faster connection times and zero-blocking streams. Pleiades v2 is currently a mixture of gRPC and varying TCP implementations.
==== QUIC ====
QUIC is the underlying networking technology. It's based off Google's SPDY protocol, and was ratified by IETF with [https://www.rfc-editor.org/rfc/rfc9000.html RFC 9000]. QUIC provides 0-RTT handshakes, multiple streams per connection, full TLS connection security, ordered bidi streaming, passive latency monitoring, and connection migrations. It is incredibly performant and is the underlying technology of HTTP/3.
The reference implementation that Pleiades will likely use for Pleiades v3 is [https://github.com/cloudflare/quiche quiche]. Cloudflare's networking backbone is some of the best in the industry, and their Rust implementation is the most used. For more information on QUIC, see Cloudflare's [https://cloudflare-quic.com/ landing page].
=== RPC & Messaging ===
While gRPC provides useful RPC functionality, it is HTTP-based, and being at the mercy of the ecosystem is miserable. Pleiades v2 uses it and it is a major lesson-learned for the project. gRPC is incredibly slow, and the ecosystem is driven by ''largest consumer needs'', so HTTP/3 won't come for years, and even then it's still HTTP. Pleiades v2 also embeds [https://nats.io NATS].
Pleiades v3 will no longer use gRPC beyond bootstrapping, but two variations of QUIC. It's expected that there will be two layers of RPC-style networking: one for the very low-level raft, gossip, and kvstore subsystems; and the other for higher-order application functionality such as messaging, queuing, and pubsub. Being at the mercy of the gRPC ecosystem is hellish, miserable, and generally a great way to let your technology rot. That being said, protocol buffers are very powerful, and useful for encoding and framing, so those will stay.
==== RPC ====
At the lowest level, Pleiades v3 will implement a custom QUIC-based protocol with protocol buffers using magic bytes and protobuf framing architectures to determine framing, routing, and message passing. While QUIC supports ordered bidi streaming on a per-stream basis, due to short-term complexity, its likely that each stream handler will maintain two streams per protocol type for ease of simplicity. The use of magic bytes is primarily to annotate and notify changes in configurations, routing, etc., and will be limited as protobufs are fully framed. There is an open proposal for [[rtRPC|rtRPC]] that would support better message passing between services.
==== Messaging ====
For higher-order application messaging needs, Pleiades v2 currently uses embedded NATS as the queuing and pubsub messaging provider. NATS is incredibly powerful, incredibly heavy, and also written in Go. Going outside of Go, the only major message queuing platform that seems to be a good fit is [https://zeromq.org/ ZeroMQ]. zmq is a very powerful solution in C++, and there are several Rust bindings for it, and one [https://github.com/zeromq/zmq.rs full-Rust implementation]. The only limitation of zmq (bindings or native) is that currently there are no QUIC socket implementations.
=== Network Interfacing ===
Pleiades v3 will likely define a standard set of network traits that each library can implement to leverage the networking library in an RPC-adjacent manner. This is dependent on Rust's memory model, and whether or not it's a good design pattern.
For the protocol buffer implementation, right now [https://crates.io/crates/quick-protobuf quick-protobuf] is attractive because it's low-level and uses clone-on-write. It also doesn't require `protoc` or other external tools, which is extremely attractive.
=== Clustering & Automation ===
Pleiades v2 uses powerful libraries from well-respected tech companies to manage clustering, membership, and other varying things for it's autonomy. However, nearly all these technologies are in Go and must be ported with non-trivial modifications, mostly to networking.
=== Gossip ===
Pleiades v2 targeted [https://github.com/hashicorp/serf Serf] for it's internal gossiping structure. The value of Serf was it's mixture of [https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/SWIM.pdf SWIM] and the [https://sites.cs.ucsb.edu/~ravenben/classes/276/papers/vivaldi-sigcomm04.pdf Vivaldi network tomography] system. However, Serf is written in Go and by Hashicorp, who are relicensing all of their new versions and products with BSL moving foward. So something has to be done.
The value of SWIM + network tomography is it's infectious gossiping, clustering, and location awareness. This allows Pleiades to loosely model a force-directed graph, where changes are rolled out through loosely connected nodes, and the network tomography allows for location-based clustering without needing to define locations. Pleiades v3 will keep SWIM the Vivaldi network tomography functionality for it's gossiping patterns. This is an essential architecture piece which allows Pleiades v3 to remain an automated constellation mesh database (re: fully-autonomous system). For the curious, Hashicorp published a [https://www.serf.io/docs/internals/simulator.html configurable convergence simulator] for SWIM; it's worth a look-see if you have performance questions.
However, there is a substantial bit of work which needs to be done to port Hashicorp's reference implementation and extensions. Hashicorp's [https://arxiv.org/abs/1707.00788 Lifeguard extensions] to SWIM is relatively minor, but the Vivaldi implementation contains several extensions from the [https://www.usenix.org/legacy/events/nsdi07/tech/full_papers/ledlie/ledlie_html/index_save.html Network Coordinates in the Wild] USENIX paper and IBM Research's [https://dominoweb.draco.res.ibm.com/492D147FCCEA752C8525768F00535D8A.html Euclidean Embedding] paper. While these extensions can be trusted due to long-term production usage, they also make it harder when referencing the papers.
The reference SWIM implementation (with Lifeguard) is called [https://github.com/hashicorp/memberlist memberlist]. memberlist bundles it's own networking and custom packet implementation for the protocol, but the Pleiades v3 port can't have either. The network messages need to be protobufs, and the networking implementation will need to support the QUIC-based stack that Pleiades will use. Otherwise, the rest of the functionality should be a relatively straightforward port using Rust's stdlib.
The Vivaldi implementation sits in the Serf [https://github.com/hashicorp/serf/tree/master/coordinate tree], and is fairly straightforward. Most of the structs can be converted to protobufs for external interfacing, but otherwise this library is fairly straightforward.
=== Clocks (Under Construction) ===
nb (sienna): this changes with the advent of HUYGENS from Google in 2018.
Pleiades v1 and v2 do not implement clocks, and this is a major design flaw. During prototyping, none of the major versions ever made it far enough to need a clock. However, for ranges to work with atomic transactions, Pleiades v3 will need accurate clocks. The OS will always sync with the system clock, but Pleiades v3 will port CockroachDB's [hybrid logical clock (HLC) implementation](https://github.com/cockroachdb/cockroach/blob/master/pkg/util/hlc/doc.go) as it's implementation. Lamport clocks have too much skew for a tightly-knit system like Pleiades, and CockroachDB's HLC implementation is stable in production. This port should be very straightforward.
=== Lifecycle Automation ===
Pleiades v1 contained no dependency injection, but Pleiades v2 uses Uber's [https://github.com/uber-go/fx fx] framework. This gives a baseline DI framework that's good enough to handle startup and shutdown events. However, fx is overcomplicated and not worth porting, so Pleiades v3 can use whatever is most popular in the Rust ecosystem. DI is imperative as Pleiades is a modular monolith, and there needs to be both control of background services and also DI for things like service-to-service clients, network handlers, etc.
Pleiades v1 and v2 did not contain lifecycle workflows, but Pleiades v3 will contain a small workflow engine to keep the internal lifecycle events manageable. As each node is fully autonomous in the constellation, the ability to handle complex workflows in for CREs is important. This will help the project maintainers advance the constellation's internals without completely rewriting large swaths of internal logic every time there's a logic change to a CRE workflow. Daniel Gerlag's [https://github.com/danielgerlag/workflow-core Workflow Core] is an excellent embeddable workflow engine in C# and is the reference implementation for Pleiades v3 workflow engine. Not all features or functionality will be needed, so the port will primarily be just the workflow engine and enough netcode to keep it controllable and observable.
=== Config Automation ===
nb (sienna): this might be a remote configuration setting since Pleiades will be embedded
Pleiades v1 used Steve Francia's [https://github.com/spf13/viper viper] and [https://github.com/spf13/cobra cobra] libraries, whereas Pleiades v2 only used viper and a custom port of Mitchell Hashimoto's [https://github.com/mitchellh/cli CLI library]. Viper is a very powerful but overcomplicated configuration management library that provides the core configuration, but it is like trying to use a shotgun on work that needs a scalpel. For the CLI and configuration library, whatever is both popular for Rust and also minimal will likely be the correct decisions to integrate. Ideally, these would be external dependencies instead of ported code.
=== Storage ===
Storage in Pleiades v1 was an absolute mess (hey, it was a prototype lol), and Pleiades v2's storage layer is better, but still not great. Pleiades v3 aims to fix that by having a single, unified storage layer that provides local, shard, and global storage opportunities to all consumers (with caveats). Both Pleiades v1 and v2 are built on Raft, but Pleiades v2 uses multi-raft.
==== Disk Storage ====
Pleiades v1 and v2 both used [https://github.com/etcd-io/bbolt bbolt], which is a Go-based port of Howard Chu's LMDB. Bbolt is a great embedded database for some workloads, but it is not enough for Pleiades. Bbolt is based on b+tree indexing, which is useful for lightning fast reads, but horrible at writes. To support more complex workloads, Pleiades v3 needs to use a Log-Structured-Merge-Tree (LSM) database, where reads and writes are a bit more balanced.
The initial target replacement for bbolt was CockroachDB's [https://github.com/cockroachdb/pebble Pebble], which is a [https://github.com/cockroachdb/pebble/blob/master/docs/rocksdb.md customized port] of Facebook's [https://github.com/facebook/rocksdb RocksDB]. RocksDB is based off Google's [https://github.com/google/leveldb LevelDB]. However, as Pleiades v3 is no longer targeting Go, Pebble is no longer a good fit for use. TiKV's storage engine uses RocksDB, which is mind-blowingly optimized for modern storage hardware, and Pleiades v3 will likely use TiKV's [RocksDB bindings](https://github.com/tikv/rust-rocksdb). As all databases, at their lowest levels, are just disk-based hash maps, using RocksDB is totally normal and saves a bunch of effort.
Pleiades v3 will use a single instance of RocksDB per node to store local, shard, and global data.
=== Raft ===
Pleiades v1 used Hashicorp's raft implementation and Pleiades v2 uses a multi-raft library called [https://github.com/lni/dragonboat dragonboat]. Dragonboat is ''impressively performant'', but it's in Go and by a very discreet maintainer who uses the handle `lni`. With Pleiades v3 being Rust-based, there are two real options for Rust-based raft: port dragonboat (not ideal) or fork & modify TiKV's [https://github.com/tikv/raft-rs raft-rs] (not ideal). Realistically, those are the two options, and neither are ideal. Porting dragonboat will be heavily error prone because it contains '''extensive''' Go-specific performance modifications, and that's not really ideal. However, raft-rs uses the [https://github.com/tokio-rs/prost prost] protobuf library, which is heavy and slow compared to quick-protobuf, and it includes it's own networking.
Pleiades v2's raft architecture was heavily influenced by dragonboat, and dragonboat has more bells and whistles than raft-rs. TiKV's internal multi-raft implementation, [https://github.com/tikv/tikv/tree/master/components/raftstore-v2 raftstore-v2], is complex and built on top of raft-rs using TiKV's [https://github.com/tikv/pd placement drivers] and [https://tikv.github.io/tikv-dev-guide/understanding-tikv/scalability/region.html regions] (their version of CockroachDB's range keys). Pleiades v3 can't really consume raftstore-v2 as-is because we don't support regions, but range keys, and our multi-raft architectures are ''wildly different''.
'''Original:''' ''Sienna's assertion is that we should fork & modify <code>raft-rs</code> to implement our networking changes and use the fork for now. Ideally, we'll submit the modifications back to <code>raft-rs</code>, but due to our custom network protocol, it's unlikely the modification will be welcome. So long as our networking modifications are minimal and isolated, it should be fairly easy to pull in patches from upstream <code>raft-rs</code> as needed. As raft is nearly 10 years old now, it's unlikely to go through major changes, so this is a fairly safe decision, it just comes with extensive maintenance burdens.''
'''Updates:''' ''<code>raft-rs</code> uses ticks to operate, which isn't sustainable. <code>[https://github.com/datafuselabs/openraft openraft]</code> is completely async and will be more effective.''
=== Ranges (re: sharding) ===
Pleiades v1 used no sharding (but also didn't use multi-raft), and Pleiades v2 uses various hashing algorithms for sharding. Architecturally, Pleiades v3 will leverage CockroachDB's range key architecture in conjunction with it's gossip fabric. As this required a full rewrite regardless of the Rust migration, there are minimal changes here.
=== Transactions & MVCC ===
Pleiades v2 contained atomic transaction support, but Pleiades v3 does not currently have any transaction support planned. It is possible that v3 will contain atomic transaction support, but it is not guaranteed. RocksDB does contain pessimistic and optimistic transaction support, so it is possible that atomic transactions will continue to exist in Pleiades v3.
Pleiades v2 contained support for atomic MVCC operations through bbolt, but Pleiades v3 will not. RocksDB contains WAL and two-phase commit functionality, which allows for similar operations, but is not quite the same. Continued MVCC support is unplanned for Pleiades v3.
=== Administration ===
Pleiades v2 had no administration layer, and administration for Pleiades v3 was planned via direct integration with SWIM. Pleiades v2 contained several fabric CLI commands, and Pleiades v3 will contain similar constructs for the time being.
=== Authentication ===
Neither versions of Pleiades contains authentication as they were architectural prototypes, but Pleiades v3 will lay the core foundation required for fine-grained authorization. Originally, Pleiades v3 was going to bundle [https://openfga.dev OpenFGA] for authentication, but that has changed now that v3 will be a full bottom-up rewrite. Aside from some nice-to-have TLS functionality to make things easier to work with, authentication is going to be put on hold until at least v3.1, but possibly later.
== v2 ==
While no longer in use, these are some of the designs of the v2 architecture.
=== KV Store Runtime Architecture ===
This is the general architecture of the KVStore, which is the core monolithic key-value architecture that underpins Pleiades.
==== Interface ====
The most up-to-date interfaces can be found in the code base, but here is the general interface structure. It includes two major features: transactions and kv operations. Transactions are atomic (by design), and are currently implicit due to the disk-based state machine implementation with bbolt.
type ITransactionManager interface {
CloseTransaction(ctx context.Context, transaction *kvpb.Transaction) error
Commit(ctx context.Context, transaction *kvpb.Transaction) *kvpb.Transaction
GetNoOpTransaction(shardId uint64) *kvpb.Transaction
GetTransaction(ctx context.Context, shardId uint64) (*kvpb.Transaction, error)
SessionFromClientId(clientId uint64) (*dclient.Session, bool)
}
type IKVStore interface {
CreateAccount(request *kvpb.CreateAccountRequest) (*kvpb.CreateAccountResponse, error)
DeleteAccount(request *kvpb.DeleteAccountRequest) (*kvpb.DeleteAccountResponse, error)
CreateBucket(request *kvpb.CreateBucketRequest) (*kvpb.CreateBucketResponse, error)
DeleteBucket(request *kvpb.DeleteBucketRequest) (*kvpb.DeleteBucketResponse, error)
GetKey(request *kvpb.GetKeyRequest) (*kvpb.GetKeyResponse, error)
PutKey(request *kvpb.PutKeyRequest) (*kvpb.PutKeyResponse, error)
DeleteKey(request *kvpb.DeleteKeyRequest) (*kvpb.DeleteKeyResponse, error)
}
This isn't a perfectly ideal interface, but it's a good starting point. Once the transactions interface is properly implemented, it will be more effective.
==== Hot Path ====
This is a high-level architecture of the hot path for kv operations. Kv operations ''must'' be as performant as possible as every higher-order use case will be built on the kvstore and be wholly reliant on it's performance.
{{Note|'''Note:''' ''This diagram assumes a raft shard has been properly provisioned and is known in advance. Check out [shard lifecycles](lifecycles-v2.md#Shards) for more information''}}
%%{init: { 'logLevel': 'debug', 'theme': 'base' }%%
graph
kvEvent -- 1 --> server.kvStoreHandler -- 2 --> server.raftBboltManager
server.raftBboltManager -- 3 --> server.transactionManager -- 4 --> runtime.ITransactionManager
runtime.ITransactionManager --> dragonboat.nodeHost -. 5 .-> server.transactionManager -. 6 .-> server.raftBboltManager
server.raftBboltManager -- 7 --> nodeHost.SyncPropose
nodeHost.SyncPropose -- 7 --> nodeHost.IStateMachine -- 7 --> nodeHost.IOnDiskStateMachine
nodeHost.IOnDiskStateMachine ---> kv.BBoltStateMachine
kv.BBoltStateMachine -- 10 --> fsm.bboltStore -. 11 .-> kv.BBoltStateMachine
kv.BBoltStateMachine -. 12 .-> nodeHost.IStateMachine -. 13 .-> nodeHost.SyncPropose -. 14 .-> server.raftBboltManager
server.raftBboltManager -. 15 .-> server.transactionManager
server.raftBboltManager -. 16 .-> server.kvStoreHandler -. 17 .-> kvEvent
# A `kvEvent` occurs. Currently these happen through the Connect<ref name="connect">[https://connectrpc.com/ Connect], Buf</ref> functionality using protobufs. Here are the current interface options that could happen
# Once the `kvEvent` occurs, it's handled by the `kvStoreHandler` in the `server` package. This layer is currently just a wrapper so it exists.
# A local transaction is requested, but is currently not implemented.
# The interface is called, but is backed by `dragonboat.NodeHost`
# The transaction, if applicable, is bubbled up and stored in the transaction manager's local cache.
# The cached version is returned to the `raftBboltStoreManager`, which is just a packaging wrapper
# The event is sent through the synchronous interface of `dragonboat.NodeHost.SyncPropose`, and it's repackaged as a byte array. While mildly inconvenient, it's useful as we can pass anything through this layer with minimal concerns.
f34a316170231b478c841e138685d70bf9424e79
Template:Note
10
4
7
2023-11-05T01:05:12Z
Sienna
2
Created page with "<languages/> <onlyinclude>{{#if: {{{1|{{{content|{{{text|{{{demo|<noinclude>demo</noinclude>}}}}}}}}}}}} | <templatestyles src="Note/styles.css" /><div role="note" class="note note-{{#switch: {{{2|{{{type|}}}}}} |gotcha=error |warning=warn |notice=info |=info |#default={{{2|{{{type|}}}}}} }} {{#ifeq:{{{inline|}}}|1|note-inline}}">{{{1|{{{content|{{{text}}}}}}}}}</div> | File:OOjs UI icon lightbulb-yellow.svg|18px|alt=<translate><!--T:1--> Note..."
wikitext
text/x-wiki
<languages/>
<onlyinclude>{{#if: {{{1|{{{content|{{{text|{{{demo|<noinclude>demo</noinclude>}}}}}}}}}}}} | <templatestyles src="Note/styles.css" /><div role="note" class="note note-{{#switch: {{{2|{{{type|}}}}}}
|gotcha=error
|warning=warn
|notice=info
|=info
|#default={{{2|{{{type|}}}}}}
}} {{#ifeq:{{{inline|}}}|1|note-inline}}">{{{1|{{{content|{{{text}}}}}}}}}</div>
| [[File:OOjs UI icon lightbulb-yellow.svg|18px|alt=<translate><!--T:1--> Note</translate>|link=]] '''<translate><!--T:2--> Note:</translate>''' }}<!--
--></onlyinclude>
{{documentation|content=
<translate>
== Usage == <!--T:3-->
</translate>
<pre>
{{Note|text=Foo}}
{{Note|type=info|text=Foo}}
{{Note|type=reminder|text=Foo}}
{{Note|type=reminder|text=Multiple<br>lines<br>of<br>text}}
{{Note|type=warn|text=Foo}}
{{Note|type=error|text=Foo}}
{{Note}} <translate nowrap><!--T:6--> Loose test</translate>
* Text {{Note|inline=1|text=Foo}}
</pre>
{{Note|text=Foo}}
{{Note|type=info|text=Foo}}
{{Note|type=reminder|text=Foo}}
{{Note|type=reminder|text=Multiple<br>lines<br>of<br>text}}
{{Note|type=warn|text=Foo}}
{{Note|type=error|text=Foo}}
{{Note}} <translate><!--T:4--> Loose test</translate>
* Text {{Note|inline=1|text=Foo}}
== Parameters ==
{{Note/doc}}
== See also ==
* {{tl|warn}}, shortcut for this template with <code>type=warning</code>.
* {{tl|mbox}}, and in particular the namespace-agnostic {{tl|ombox}}, which by default resembles a typical "info" template.
}}
[[Category:Templates{{#translation:}}|{{PAGENAME}}]]
9ca8639dde6299d75c337d46db3ef58bf3fb0294
User:Sienna
2
5
9
2023-11-05T01:10:24Z
Sienna
2
Created page with "= About = Hi, I'm the creator and project leave of the Nova Engine!"
wikitext
text/x-wiki
= About =
Hi, I'm the creator and project leave of the Nova Engine!
3397a3e189baa28731e0e25e613639da89c0ec09
Constellation Mesh
0
6
10
2023-11-05T01:41:50Z
Sienna
2
Created page with "Because Pleiades is like nothing else which exists, by design, it's hard to describe how it is different. Initially, Pleiades was compared to a globally distributed data fabric, but that was missing a core aspect of Pleiades: autonomy. We tried to compare it to a data mesh, but there's no architectural alignment with the specific business domains expected with a data mesh. After many different iterations, and comparing Pleiades to many different system models, User:Sie..."
wikitext
text/x-wiki
Because Pleiades is like nothing else which exists, by design, it's hard to describe how it is different. Initially, Pleiades was compared to a globally distributed data fabric, but that was missing a core aspect of Pleiades: autonomy. We tried to compare it to a data mesh, but there's no architectural alignment with the specific business domains expected with a data mesh. After many different iterations, and comparing Pleiades to many different system models, [[User:Sienna|Sienna]] landed on *constellation mesh*.
== What is a ''constellation mesh''? ==
A mesh, [https://www.oracle.com/integration/what-is-data-mesh/ as defined by Oracle], is a distributed architecture for data management. That fits into the distributed architecture model for Pleiades without defining domain alignment but doesn't define the type of mesh. In systems engineering, the term "constellation" is commonly used to refer to autonomous satellites which work in coordination to provide different aspects of a distributed, unified, and autonomous data set. Pleiades is a distributed, autonomous system working in independent coordination focusing on data management as it's primary feature (re: fancy distributed database).
So what makes Pleiades a distributed, autonomous system working in coordination? Pleiades is designed from the ground up to be able to handle an exabyte's worth of data while only having a single operator. This means internal architectures require a mixture of distribution and autonomy whenever possible, giving visibility to the operator, but also not requiring hands-on operational management. Configurations, workload scheduling, and many other internal operations in the constellation must happen independently, and the scale requires a leader-less, decentralized design. This also increases the complexity, but only if modeled incorrectly.
== Understanding Configuration Propagation ==
One of the most useful models for understanding the systemic impacts of automated decision-making in a decentralized network is a force-directed graph. In force-directed graphs, attraction is generally modeled with $F_s = kx$ and repulsion is generally modeled with $|F| = \frac{1}{4\pi\epsilon_0}\frac{|q_1q_2|}{r^2}$, and iterative simulations demonstrate how mechanical equilibrium can be achieved across the entire graph. Here is a force-directed graph using [Verlet integration](https://en.wikipedia.org/wiki/Verlet_integration) to demonstrate network propagation between nodes.
> **Note:** *Click and drag the nodes to see how force applied to one node affects the rest*
<html>
<iframe width="100%" height="684" frameborder="0" src="https://observablehq.com/embed/@d3/force-directed-graph?cells=chart" />
</html>
While Pleiades doesn’t use Verlet integration or operate in this exact way, it’s a useful visual example. To understand how it relates to Pleiades, it’s helpful to understand the nuances of this visual.
Automated decision-making in a distributed system can have very similar sets of characteristics, but instead of a ''mechanical'' equilibrium being simulated, *virtual change equilibrium* would be achieved through network propagation. ''Virtual Change Equilibrium'' (VCE) is the result of a ''Constellation Runtime Event'' (CRE, re: a change) being successfully propagated throughout the entire constellation.
Pleiades' internal clustering model is modeled with graph connectivity instead of a centralized membership. Each node in the constellation is only aware of its neighbours, some top-level metadata, and how to handle CREs. A CRE is ''a neighbour-only broadcast''. An example would be when a node is shutting down, it will broadcast the CRE of `leave` to its local neighbours who will repeat the same message, so on and so forth, until the constellation has reached virtual change equilibrium. Other CREs are things like `join`, `query`, and `update`, all of which follow the same propagation model to achieve VCE.
The constellation model is enabled through SWIM<ref name="swim" />, and Pleiades uses Hashicorp's implementation with their Lifeguard extensions<ref name="lifeguard" />. While SWIM allows for the constellation's members to be modeled concretely, Pleiades also leverages network tomography to handle VCE. Pleiades uses the Hashicorp network tomography library, `coordinate`<ref name="tomo" />, to provide real-time computed network coordinates for each node in the constellation. SWIM enables constellation membership, `coordinate` provides locality, and the internal lifecycle state machines together make Pleiades an autonomous distributed system.
Using a workload adjustment event, it can be easier to understand the implementation nuances. When a node containing the leader of a range replica receives an internal scaling event (re: scale up or scale down), it will send a CRE with some change metadata, and the constellation will quickly achieve VCE asychronously. From the triggering node's perspective, VCE happens once the broadcast call returns. After triggering the CRE, the node will broadcast a `query` CRE asking for the nearest neighbours with available capacity to create a new range replica. Once the neighbours have been identified, the node will communicate directly with the closest identified neighbour to instantiate a range replica. The identified neighbour will broadcast the new range replica CRE as part of the initial VCE, and the original node will start the relevant scaling workflow. One the workflow has finished, the original node which triggered the CRE will broadcast a final CRE with the final change metadata.
While there are many things not covered in that example, the internal autonomy of the constellation allows for things like scaling events to be handled by any node, while keeping the entire constellation aware of the change. This is what makes Pleiades a ''constellation mesh database''. Hopefully that helps! Please feel free to open a discussion if you have questions.
=== Notes ===
<references>
<ref name="swim">[https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/SWIM.pdf|DOI], SWIM: scalable weakly-consistent infection-style process group membership protocol</ref>
<ref name="lifeguard">[https://arxiv.org/abs/1707.00788|ArXiv], Lifeguard: Local Health Awareness for More Accurate Failure Detection</ref>
<ref name="tomo">[https://github.com/hashicorp/serf/tree/master/coordinate|GitHub], Serf</ref>
</references>
8510dc79161bb1464526ec969af6497d948649f3
Contributing
0
7
11
2023-11-05T01:43:51Z
Sienna
2
Created page with "Thanks for being interested in contributing to the Nova Engine project! === Trunk-based Development === Trunk-based development is a version control management practice where developers merge small, frequent updates to a core “trunk” or main branch. It’s a common practice among teams and part of the SRE lifecycle since it streamlines merging and integration phases. Trunk-based development is a required practice of [https://www.atlassian.com/continuous-delivery/con..."
wikitext
text/x-wiki
Thanks for being interested in contributing to the Nova Engine project!
=== Trunk-based Development ===
Trunk-based development is a version control management practice where developers merge small, frequent updates to a core “trunk” or main branch. It’s a common practice among teams and part of the SRE lifecycle since it streamlines merging and integration phases. Trunk-based development is a required practice of [https://www.atlassian.com/continuous-delivery/continuous-integration/trunk-based-development true CI/CD].
Rather than relying on feature branches, Trunk Based Development has each developer work locally and independently on their project, and then merge their changes back into the main branch (the trunk) at least once a day. [https://www.gitkraken.com/blog/trunk-based-development Merges must occur whether or not feature changes or additions are complete].
=== Everything Else ===
* Open an issue with a proposed change if it's larger than a bugfix
* Be very careful about memory allocations
* Use linear commits whenever possible
* Fix-forward, no backports
* Document your code
* Ask questions if you get stuck!
a022378732dff892b1033cba0fc4028ecc2d3366
12
11
2023-11-05T02:15:05Z
Sienna
2
init
wikitext
text/x-wiki
Thanks for being interested in contributing to the Nova Engine project!
=== Trunk-based Development ===
Trunk-based development is a version control management practice where developers merge small, frequent updates to a core “trunk” or main branch. It’s a common practice among teams and part of the SRE lifecycle since it streamlines merging and integration phases. Trunk-based development is a required practice of [https://www.atlassian.com/continuous-delivery/continuous-integration/trunk-based-development true CI/CD].
Rather than relying on feature branches, Trunk Based Development has each developer work locally and independently on their project, and then merge their changes back into the main branch (the trunk) at least once a day. [https://www.gitkraken.com/blog/trunk-based-development Merges must occur whether or not feature changes or additions are complete].
=== Research Environment ===
We use Zotero for collecting and analyzing research. You can find our current research library in the [https://www.zotero.org/groups/5160664/pleiades_supercompute Pleiades Supercompute] group. If you'd like to submit research to be applied to Pleiades, please let one of the project maintainers know and we'll start a discussion around it!
=== Development Environment ===
Tools you'll need:
* [https://www.rust-lang.org/ Rust]
* [https://gerrit.googlesource.com/git-repo repo] from Google
* [https://multipass.run/ multipass] from Canonical
* An [https://login.ubuntu.com/ Ubuntu One] account
* Editor or IDE, [https://code.visualstudio.com/ Visual Studio Code] or [https://www.jetbrains.com/remote-development/gateway/ Jetbrains Gateway] are recommended
All of our development tools are based around Ubuntu with some compatibility for arm64 macOS. We also support [https://code.visualstudio.com/ Visual Studio Code] and [https://www.jetbrains.com/remote-development/gateway/ Jetbrains Gateway] via client installation. Several of the bits of server code require Linux with the `PREEMPT_RT` patch for the runtime server due to the hard real-time requirements. To make it easier to develop, we use `multipass` from Canonical. You'll also need a [https://ubuntu.com/pro free Ubuntu Pro] subscription to get the `PREEMPT_RT` kernel.
Once you've installed `multipass`, it's recommended to create an alias of `mp` to make it easier to work with.
== Setup ==
First, you'll need to install `repo`. You can find the current instructions on the [https://source.android.com/docs/setup/download#installing-repo Android website]. `repo` is used to manage all of the core repositories. Once you've
=== Setting up multipass (Under Construction) ===
Add your SSH public key to the cloud-init file found at `build/cloud-config.yaml`. Otherwise you won't be able to log into your VM.
# make sure you're in the root of the repo
# create the vm. this uses 4 cpu, 16gib of ram, and 40gb of disk space.
# adjust these values as needed
mp launch -c 4 -m 16G -d 40G --cloud-init build/cloud-config.yaml -n primary
# copy your git stuff to make life easier
mp transfer -r ~/.ssh primary:/home/ubuntu/
mp transfer -r ~/.git* primary:/home/ubuntu/
# copy the pleiades source
cd .. && mp transfer -r pleiades primary:/home/ubuntu/ && cd Pleiades/
Once the VM has been provisioned, it will say something along the lines of `Launched: primary` and a few mounting notes. While you wait, [https://ubuntu.com/pro/dashboard log into your Ubuntu Pro account] and grab your token. Once the machine has been provisioned, you can access it and finish the installation:
# get into the vm
mp shell primary
# (optional) install the recommended vscode extensions
code --install-extension "GitHub.copilot"
code --install-extension "minherz.copyright-inserter"
code --install-extension "ms-vscode-remote.vscode-remote-extensionpack"
code --install-extension "ms-vsliveshare.vsliveshare"
code --install-extension "redhat.vscode-commons"
code --install-extension "redhat.vscode-xml"
code --install-extension "redhat.vscode-yaml"
code --install-extension "remcohaszing.schemastore"
code --install-extension "rust-lang.rust-analyzer"
code --install-extension "timonwong.shellcheck"
# attach your pro subscription
sudo pro attach <token>
# install the rt kernel
sudo pro enable realtime-kernel
# reboot
sudo reboot now
'''Warning:''' ''Even if you have an Intel processor, do not install the `intel-iotg` variant of the RT kernel as Pleiades is currently built on ARM platforms.''
At this point, the core VM is set up. Hooking it up to vscode is pretty simple at this point:
# Get the IP of the VM
## You can use `mp info primary --format json | jq -r '.info.primary.ipv4[0]'` to make it simpler
# Add a new SSH host to vscode
# Connect to the SSH host
# Select the `pleiades` folder from the `ubuntu` user's home directory.
At this point you are all set up!
'''Warning:''' ''The source code from your laptop will be copied to your VM, and then whatever changes you make there won't be mirrored back. Use `mp transfer` to bring all the source files back if you need them on your laptop.''
=== OpenPGP Keys ===
If you're like "good" devs and you use OpenPGP keys for git security, you'll also want to add those. [[User:Sienna|Sienna]] uses Keybase to manage my keys just for ease of use. You'll want to add those to your devbox as well.
# export the keys
keybase pgp export > key.asc
keybase pgp export -s > priv-key.asc
# transfer the keys to the devbox
mp transfer key.asc primary:/home/ubuntu
mp transfer priv-key.asc primary:/home/ubuntu
# import the keys in the devbox
gpg --import key.asc
gpg --import priv-key.asc
And now you're all setup!
== Everything Else ==
* Open an issue with a proposed change if it's larger than a bugfix
* Be very careful about memory allocations
* Use linear commits whenever possible
* Fix-forward, no backports
* Document your code
* Ask questions if you get stuck!
a542686746c664ca2d63da58af27fee579b9194a
509
12
2023-11-05T05:21:56Z
Sienna
2
wikitext
text/x-wiki
Thanks for being interested in contributing to the Nova Engine project!
=== Trunk-based Development ===
Trunk-based development is a version control management practice where developers merge small, frequent updates to a core “trunk” or main branch. It’s a common practice among teams and part of the SRE lifecycle since it streamlines merging and integration phases. Trunk-based development is a required practice of [https://www.atlassian.com/continuous-delivery/continuous-integration/trunk-based-development true CI/CD].
Rather than relying on feature branches, Trunk Based Development has each developer work locally and independently on their project, and then merge their changes back into the main branch (the trunk) at least once a day. [https://www.gitkraken.com/blog/trunk-based-development Merges must occur whether or not feature changes or additions are complete].
=== Research Environment ===
We use Zotero for collecting and analyzing research. You can find our current research library in the [https://www.zotero.org/groups/5160664/pleiades_supercompute Pleiades Supercompute] group. If you'd like to submit research to be applied to Pleiades, please let one of the project maintainers know and we'll start a discussion around it!
=== Development Environment ===
Tools you'll need:
* [https://www.rust-lang.org/ Rust]
* [https://gerrit.googlesource.com/git-repo repo] from Google
* [https://multipass.run/ multipass] from Canonical
* An [https://login.ubuntu.com/ Ubuntu One] account
* Editor or IDE, [https://code.visualstudio.com/ Visual Studio Code] or [https://www.jetbrains.com/remote-development/gateway/ Jetbrains Gateway] are recommended
* [https://buf.build buf] CLI
All of our development tools are based around Ubuntu with some compatibility for arm64 macOS. We also support [https://code.visualstudio.com/ Visual Studio Code] and [https://www.jetbrains.com/remote-development/gateway/ Jetbrains Gateway] via client installation. Several of the bits of server code require Linux with the `PREEMPT_RT` patch for the runtime server due to the hard real-time requirements. To make it easier to develop, we use `multipass` from Canonical. You'll also need a [https://ubuntu.com/pro free Ubuntu Pro] subscription to get the `PREEMPT_RT` kernel.
Once you've installed `multipass`, it's recommended to create an alias of `mp` to make it easier to work with.
== Setup ==
First, you'll need to install <code>repo</code>. You can find the current instructions on the [https://source.android.com/docs/setup/download#installing-repo Android website]. <code>repo</code> is used to manage all of the core repositories. Once you've installed <code>repo</code>, setup the workspace.
<blockquote>If you're on macOS, you might need to edit the shebang of <code>~/.bin/repo</code> to use python3</blockquote>
mkdor nova-workspace && cd nova-workspace
repo init -u https://review.gerrithub.io/shieldmaidens/manifest --config-name -b mainline
repo sync
This will clone all of the repositories to your local workspace.
=== Setting up multipass (Under Construction) ===
Add your SSH public key to the cloud-init file found at `build/cloud-config.yaml`. Otherwise you won't be able to log into your VM.
# make sure you're in the root of the repo
# create the vm. this uses 4 cpu, 16gib of ram, and 40gb of disk space.
# adjust these values as needed
mp launch -c 4 -m 16G -d 40G --cloud-init build/cloud-config.yaml -n primary
# copy your git stuff to make life easier
mp transfer -r ~/.ssh primary:/home/ubuntu/
mp transfer -r ~/.git* primary:/home/ubuntu/
# copy the pleiades source
cd .. && mp transfer -r pleiades primary:/home/ubuntu/ && cd Pleiades/
Once the VM has been provisioned, it will say something along the lines of `Launched: primary` and a few mounting notes. While you wait, [https://ubuntu.com/pro/dashboard log into your Ubuntu Pro account] and grab your token. Once the machine has been provisioned, you can access it and finish the installation:
# get into the vm
mp shell primary
# (optional) install the recommended vscode extensions
code --install-extension "GitHub.copilot"
code --install-extension "minherz.copyright-inserter"
code --install-extension "ms-vscode-remote.vscode-remote-extensionpack"
code --install-extension "ms-vsliveshare.vsliveshare"
code --install-extension "redhat.vscode-commons"
code --install-extension "redhat.vscode-xml"
code --install-extension "redhat.vscode-yaml"
code --install-extension "remcohaszing.schemastore"
code --install-extension "rust-lang.rust-analyzer"
code --install-extension "timonwong.shellcheck"
# attach your pro subscription
sudo pro attach <token>
# install the rt kernel
sudo pro enable realtime-kernel
# reboot
sudo reboot now
'''Warning:''' ''Even if you have an Intel processor, do not install the `intel-iotg` variant of the RT kernel as Pleiades is currently built on ARM platforms.''
At this point, the core VM is set up. Hooking it up to vscode is pretty simple at this point:
# Get the IP of the VM
## You can use `mp info primary --format json | jq -r '.info.primary.ipv4[0]'` to make it simpler
# Add a new SSH host to vscode
# Connect to the SSH host
# Select the `pleiades` folder from the `ubuntu` user's home directory.
At this point you are all set up!
'''Warning:''' ''The source code from your laptop will be copied to your VM, and then whatever changes you make there won't be mirrored back. Use `mp transfer` to bring all the source files back if you need them on your laptop.''
=== OpenPGP Keys ===
If you're like "good" devs and you use OpenPGP keys for git security, you'll also want to add those. [[User:Sienna|Sienna]] uses Keybase to manage my keys just for ease of use. You'll want to add those to your devbox as well.
# export the keys
keybase pgp export > key.asc
keybase pgp export -s > priv-key.asc
# transfer the keys to the devbox
mp transfer key.asc primary:/home/ubuntu
mp transfer priv-key.asc primary:/home/ubuntu
# import the keys in the devbox
gpg --import key.asc
gpg --import priv-key.asc
And now you're all setup!
== Everything Else ==
* Open an issue with a proposed change if it's larger than a bugfix
* Be very careful about memory allocations
* Use linear commits whenever possible
* Fix-forward, no backports
* Document your code
* Ask questions if you get stuck!
2a53bc62a944fcc05d568aeb7c350218f84a4b60
510
509
2023-11-05T05:26:09Z
Sienna
2
/* Setup */
wikitext
text/x-wiki
Thanks for being interested in contributing to the Nova Engine project!
=== Trunk-based Development ===
Trunk-based development is a version control management practice where developers merge small, frequent updates to a core “trunk” or main branch. It’s a common practice among teams and part of the SRE lifecycle since it streamlines merging and integration phases. Trunk-based development is a required practice of [https://www.atlassian.com/continuous-delivery/continuous-integration/trunk-based-development true CI/CD].
Rather than relying on feature branches, Trunk Based Development has each developer work locally and independently on their project, and then merge their changes back into the main branch (the trunk) at least once a day. [https://www.gitkraken.com/blog/trunk-based-development Merges must occur whether or not feature changes or additions are complete].
=== Research Environment ===
We use Zotero for collecting and analyzing research. You can find our current research library in the [https://www.zotero.org/groups/5160664/pleiades_supercompute Pleiades Supercompute] group. If you'd like to submit research to be applied to Pleiades, please let one of the project maintainers know and we'll start a discussion around it!
=== Development Environment ===
Tools you'll need:
* [https://www.rust-lang.org/ Rust]
* [https://gerrit.googlesource.com/git-repo repo] from Google
* [https://multipass.run/ multipass] from Canonical
* An [https://login.ubuntu.com/ Ubuntu One] account
* Editor or IDE, [https://code.visualstudio.com/ Visual Studio Code] or [https://www.jetbrains.com/remote-development/gateway/ Jetbrains Gateway] are recommended
* [https://buf.build buf] CLI
All of our development tools are based around Ubuntu with some compatibility for arm64 macOS. We also support [https://code.visualstudio.com/ Visual Studio Code] and [https://www.jetbrains.com/remote-development/gateway/ Jetbrains Gateway] via client installation. Several of the bits of server code require Linux with the `PREEMPT_RT` patch for the runtime server due to the hard real-time requirements. To make it easier to develop, we use `multipass` from Canonical. You'll also need a [https://ubuntu.com/pro free Ubuntu Pro] subscription to get the `PREEMPT_RT` kernel.
Once you've installed `multipass`, it's recommended to create an alias of `mp` to make it easier to work with.
== Setup ==
First, you'll need to install <code>repo</code>. You can find the current instructions on the [https://source.android.com/docs/setup/download#installing-repo Android website]. <code>repo</code> is used to manage all of the core repositories. Once you've installed <code>repo</code>, setup the workspace.
<blockquote>If you're on macOS, you might need to edit the shebang of <code>~/.bin/repo</code> to use python3</blockquote>
mkdir nova-workspace && cd nova-workspace
repo init -u https://review.gerrithub.io/shieldmaidens/manifest --config-name -b mainline
repo sync
This will clone all of the repositories to your local workspace.
=== Setting up multipass (Under Construction) ===
Add your SSH public key to the cloud-init file found at `build/cloud-config.yaml`. Otherwise you won't be able to log into your VM.
# make sure you're in the root of the repo
# create the vm. this uses 4 cpu, 16gib of ram, and 40gb of disk space.
# adjust these values as needed
mp launch -c 4 -m 16G -d 40G --cloud-init build/cloud-config.yaml -n primary
# copy your git stuff to make life easier
mp transfer -r ~/.ssh primary:/home/ubuntu/
mp transfer -r ~/.git* primary:/home/ubuntu/
# copy the pleiades source
cd .. && mp transfer -r pleiades primary:/home/ubuntu/ && cd Pleiades/
Once the VM has been provisioned, it will say something along the lines of `Launched: primary` and a few mounting notes. While you wait, [https://ubuntu.com/pro/dashboard log into your Ubuntu Pro account] and grab your token. Once the machine has been provisioned, you can access it and finish the installation:
# get into the vm
mp shell primary
# (optional) install the recommended vscode extensions
code --install-extension "GitHub.copilot"
code --install-extension "minherz.copyright-inserter"
code --install-extension "ms-vscode-remote.vscode-remote-extensionpack"
code --install-extension "ms-vsliveshare.vsliveshare"
code --install-extension "redhat.vscode-commons"
code --install-extension "redhat.vscode-xml"
code --install-extension "redhat.vscode-yaml"
code --install-extension "remcohaszing.schemastore"
code --install-extension "rust-lang.rust-analyzer"
code --install-extension "timonwong.shellcheck"
# attach your pro subscription
sudo pro attach <token>
# install the rt kernel
sudo pro enable realtime-kernel
# reboot
sudo reboot now
'''Warning:''' ''Even if you have an Intel processor, do not install the `intel-iotg` variant of the RT kernel as Pleiades is currently built on ARM platforms.''
At this point, the core VM is set up. Hooking it up to vscode is pretty simple at this point:
# Get the IP of the VM
## You can use `mp info primary --format json | jq -r '.info.primary.ipv4[0]'` to make it simpler
# Add a new SSH host to vscode
# Connect to the SSH host
# Select the `pleiades` folder from the `ubuntu` user's home directory.
At this point you are all set up!
'''Warning:''' ''The source code from your laptop will be copied to your VM, and then whatever changes you make there won't be mirrored back. Use `mp transfer` to bring all the source files back if you need them on your laptop.''
=== OpenPGP Keys ===
If you're like "good" devs and you use OpenPGP keys for git security, you'll also want to add those. [[User:Sienna|Sienna]] uses Keybase to manage my keys just for ease of use. You'll want to add those to your devbox as well.
# export the keys
keybase pgp export > key.asc
keybase pgp export -s > priv-key.asc
# transfer the keys to the devbox
mp transfer key.asc primary:/home/ubuntu
mp transfer priv-key.asc primary:/home/ubuntu
# import the keys in the devbox
gpg --import key.asc
gpg --import priv-key.asc
And now you're all setup!
== Everything Else ==
* Open an issue with a proposed change if it's larger than a bugfix
* Be very careful about memory allocations
* Use linear commits whenever possible
* Fix-forward, no backports
* Document your code
* Ask questions if you get stuck!
04e75271851413d23f81fff023c8b7c8692f3f35
MediaWiki:Citizen-footer-tagline
8
8
15
2023-11-05T02:19:30Z
Sienna
2
Created page with "Provided to you by the Pleiades Authours. Help us keep the documentation up to date!"
wikitext
text/x-wiki
Provided to you by the Pleiades Authours. Help us keep the documentation up to date!
8e28acc50ebd92defbc42a48fecb80d4566ce976
MediaWiki:Citizen-footer-desc
8
9
16
2023-11-05T02:20:43Z
Sienna
2
Created page with "Provided to you by the Pleiades Authours. Help us keep it up to date <3"
wikitext
text/x-wiki
Provided to you by the Pleiades Authours. Help us keep it up to date <3
913b33570c91f59cdb3031da78c52ae26e303338
Shieldmaidens Wiki:Copyrights
4
10
17
2023-11-05T02:21:21Z
Sienna
2
Created page with "This is a test of the copyright section?"
wikitext
text/x-wiki
This is a test of the copyright section?
899499b67ad0fbc92ac1899566762d59f6cdaf93
Nova Engine
0
11
18
2023-11-05T02:23:56Z
Sienna
2
Created page with "Landing Page Check out our [[Research Library|research library]], where we keep an accurate list of where to find various bits of research that's used to inform the designs of Nova & Pleiades"
wikitext
text/x-wiki
Landing Page
Check out our [[Research Library|research library]], where we keep an accurate list of where to find various bits of research that's used to inform the designs of Nova & Pleiades
5932c5ea694bcd67b93cae0ca27ba09cc59f50bc
Research Library
0
12
19
2023-11-05T02:32:19Z
Sienna
2
Created page with "Empty"
wikitext
text/x-wiki
Empty
3159fe421b3221381b3c778dc1c3c26e4540be37
RtRPC
0
254
503
2023-11-05T04:38:23Z
Sienna
2
Created page with "'''Feature Name:''' Pleiades Wire Protocol v1.0 (rtRPC) '''Status:''' draft '''Start Date:''' 23 August 2023 '''Authors:''' Sienna Lloyd [mailto:sienna@linux.com sienna@r3t.io] = Summary = Pleiades v3's internal architecture is removing a dependency on gRPC and other RPC frameworks to gain some technical independence and flexibility. This also comes with an added burden of needing to define a dedicated wire protocol. This document defines the layout of the v1 wire..."
wikitext
text/x-wiki
'''Feature Name:''' Pleiades Wire Protocol v1.0 (rtRPC)
'''Status:''' draft
'''Start Date:''' 23 August 2023
'''Authors:''' Sienna Lloyd [mailto:sienna@linux.com sienna@r3t.io]
= Summary =
Pleiades v3's internal architecture is removing a dependency on gRPC and other RPC frameworks to gain some technical independence and flexibility. This also comes with an added burden of needing to define a dedicated wire protocol. This document defines the layout of the v1 wire protocol, and how to successfully implement it. Generated clients and servers are well outside the scope of this document; however this protocol enables easy code generation.
<blockquote>[!info] The intent of this protocol is to be simple enough that novice systems programmers can implement clients while also being powerful enough to meet long-term needs.
</blockquote>
= Motivation =
<blockquote>[!tldr] gRPC sucks
</blockquote>
gRPC is slow, heavy, and is focused on supporting their largest consumers. The technology is old, stale, and in some languages, such as Go, completely hardcoded implementations are the defaults. Other RPC frameworks are either minimal in their support, or are lacking substantial features. Pleiades v3 internal architecture can't continue to progress while also maintaining ties to gRPC.
The technical motivation is ''less is more''. To effectively meet the performance requirements of Pleiades at scale, the networking protocol must also be performant. gRPC is HTTP-based, where as the v1 wire protocol will be UDP-based with QUIC. This means Pleiades nodes and clients can connect and immediately send data without roundtripping via 0-RTT, as well as bidirectional streaming over individual streams, muxing over multiple streams, or any other pattern.
<blockquote>[!info] For more information on QUIC, read [https://www.rfc-editor.org/rfc/rfc9000.html RFC 9000]. It's long but worth the read.
</blockquote>
By removing the dependency on gRPC, we also free up access to Pleiades overall, and only limit the access via QUIC and protocol buffers. While this new format will remove gRPC, it does continue with protocol buffers. Protocol buffers are an industry standard of data encoding and changing away from them only makes data interfaces more difficult.
This design also allows Pleiades to have a very simple, but heavily muxed service implementation for RPC-style services.
= Technical design =
The Pleiades Wire Protocol v1 (PWP) is simple in architecture, but detailed in implementation. As an important piece of context, PWP is based around the concept of streaming, instead of call and response. Stream programming is a different functional architecture than call and response, and as such different architectural decisions are made.
Generally, there are a few core constructs in PWP:
* ''stream pairs''
* ''magic bytes''
* ''payloads''
* ''contexts''
Stream pairs are sets of two bidi streams within a single QUIC connection. The first stream allows for negotiation of the second stream. As QUIC supports <math>2^{64}-1</math> streams, it's much simpler to set up separate request and response streams than trying to mux over a single stream; re: a ''stream pair''.
Magic bytes are fairly straightforward and provide basic context within a stream used as control opportunities.
Payloads are just that, and come in two forms: ''metadata bytes'' and ''messages''. Metadata bytes are short, simple <code>uintX</code>-style metadata response payloads that answer simple questions. Messages are protobuf encoding payloads that contain application-level requests and responses.
Contexts are administrative references used to understand and debug a stream pair. Contexts are generally abstract, but can be concretely implemented.
With these core constructs, an entire RPC-style contract can be built with minimal effort on top of QUIC. QUIC provides the base streaming abstractions for us, and there's very little that we have to do to set that up. A key takeaway about the core technical design is the distinct lack of framing. Framing involves a significant overhead and expects significant inconsistencies in the transport. As QUIC provides ordered streams with retry buffers, Pleiades is guaranteed to get composited messages in order. Structured framing provides no real value for high maintenance costs. However, frame synchronization is a key architectural takeaway that is being kept.
<blockquote>[!info] For more information on framing, see the Wikipedia article on [https://en.wikipedia.org/wiki/Frame_(networking) frame design].
</blockquote>
Frame synchronization in PWP is less about frames and more about stream synchronization. Ultimately, the difference between frame synchronization and stream synchronization is the chunk of data which is parsed. For more classical framing, such as ethernet or TCP frames, there are standard packet transmission sizes that inform the reader of how much information to read, parse and return. Framing requires larger buffers, more memory allocation, and more processor cycles to manage. In leaky or inconsistent environments, this is a reasonable tradeoff, but the value of QUIC is that it abstracts this for us at the lowest levels. To a client, a QUIC stream is a guaranteed delivery data stream - QUIC a hardline into a switch vs TCP's wifi connection.
As PWP is a streaming protocol, not a call and response protocol, the frameless design allows for ridiculously small signals to be transmitted across the wire but provide massive control contexts. As an example, with only 16 total bytes transmitted, a server and client will have established an entire service construct ready for application-level messages to be passed back and forth. If we include version checking, it adds an extra 2 bytes, bringing the total byte transmission to be 18. As a comparison, just the frame headers of HTTP/2 requires 18 total bytes without the payload, there's no inferable context, and the call and response has been completed. HTTP/3 uses the same semantics, however it is implemented on top of QUIC instead of native TCP.
The value of using a streaming protocol is through timing and per-client throughput decisions. For example, a client could open a connection, create the initial stream, send the <code>HELLO</code> magic byte, wait for the response stream magic byte and it's respective payload, send the service type request, wait for the response, and continue operations in a synchronous fashion. There is nothing wrong with that client design, and it would work well for mobile devices or low-end clients with performance limitations. However, a client could do everything from opening the connection to the first RPC message without ever receiving a response from the server. This allows for immediate communication at transmission speeds, and all the client has to do is operate on the order of the payloads it receives and it will have achieved the same end result, but in a fraction of the time.
<blockquote>[!tldr] Context is a construct from graph computing. Context is the localized relevance of something as it relates to a command or operation. Contexts in PWP are set by the magic bytes, and can change the set of operations an implementing client is using. ## Stream Pairs
</blockquote>
<blockquote>[!todo] Finish this section
</blockquote>
== Magic Bytes ==
Magic bytes are just contextual bytes of information that help clients and servers understand the varying states of a stream. Magic bytes are strictly <code>uint8</code> values that represent the state of an overall stream. Below is the table of magic bytes for PWP v1.0.
{|
!width="2%"| Byte
!width="11%"| Usage
!width="5%"| Sender
!width="74%"| Notes
!width="6%"| Metadata Payload Size
|-
| <code>0x01</code>
| <code>HELLO</code>
| Client
| Initial byte sent on a new stream connection, and is purely for connectivity verification. It is sent by the connecting client.
|
|-
| <code>0x02</code>
| <code>RESPONSE_STREAM_ID_INCOMING</code>
| Server
| Sent by the server immediately after a <code>HELLO</code> byte has been received.
| <code>uint64</code>
|-
| <code>0x03</code>
| <code>STREAM_SETUP_COMPLETE</code>
| Client
| Sent from the client to the server once the request and response streams have been established. Once this byte is received by the client, service negotiation can begin. At this point, the application-level base connection has been established.
|
|-
| <code>0x05</code>
| <code>SERVER_VERSION_REQUEST</code>
| Client
| Sent by the client to the server to verify version compatibility.
| <code>uint8</code>
|-
| <code>0x06</code>
| <code>SERVER_VERSION_RESPONSE</code>
| Server
| Response from the server containing the maximally supported protocol version.
| <code>uint8</code>
|-
| <code>0x10</code>
| <code>SERVICE_TYPE_REQUEST</code>
| Client
| A request sent by the client for a specific service connection.
| <code>uint8</code>
|-
| <code>0x11</code>
| <code>SERVICE_TYPE_RESPONSE</code>
| Server
| Response sent by the server verifying the service connection.
| <code>uint8</code>
|-
| <code>0x12</code>
| <code>SERVICE_STATUS_REQUEST</code>
| Client
| Request sent by the client to the server requesting a healthcheck of the service.
| <code>uint8</code>
|-
| <code>0x13</code>
| <code>SERVICE_STATUS_RESPONSE</code>
| Server
| Response from the server containing the healthcheck byte from the requested service.
| <code>uint8</code>
|-
| <code>0x20</code>
| <code>PAYLOAD_TYPE_INCOMING</code>
| Client or Server
| Prelude byte for an incoming message specifying the type of message.
| <code>uint8</code>
|-
| <code>0x21</code>
| <code>PAYLOAD_CHECKSUM_INCOMING</code>
| Client or Server
| Prelude byte for the CRC-32C checksum of a message.
| <code>uint32</code>
|-
| <code>0x21</code>
| <code>PAYLOAD_SIZE_INCOMING</code>
| Client or Server
| Prelude byte specifying the size of the incoming message.
| <code>uint64</code>
|-
| <code>0xD0</code>
| <code>LATENCY_TEST_START</code>
| Client or Server
| Notification of an incoming latency test.
| <code>uint64</code>
|-
| <code>0xD1</code>
| <code>LATENCY_TEST_STOP</code>
| Client or Server
| Notification of the end of an established latency test.
| <code>uint64</code>
|-
| <code>0xE0</code>
| <code>HEARTBEAT</code>
| Client or Server
| Heartbeat, or keepalive, ping.
|
|-
| <code>0xE1</code>
| <code>HEARTBEAT_FREQUENCY_SETTING_INCOMING</code>
| Client
| Sent by the client to the server to notify the frequency change of the heartbeat messages.
| <code>uint8</code>
|-
| <code>0xFF</code>
| <code>CLOSE</code>
| Client or Server
| Final byte sent in a stream to notify of stream closure.
|
|}
Each of these magic bytes represents a different set of information. For the most part, it should be easy to understand how they work.
== Payloads ==
These payloads are split into two core types: metadata and messages.
Metadata payloads are specific to each of the magic bytes as simple responses to simple requests. The max size of a metadata payload is the value of a <code>uint64</code>, or 8 bytes.
Messages are application-specific payloads used in requests, responses, and application-level stream messages. Messages are encoded protocol buffers for specific bundles services.
=== Metadata Payloads ===
These payloads create, enrich, or change the contexts of a stream pair for clients and servers. Some values are dictated by the protocol, whereas some
{|
!width="24%"| Payload
!width="5%"| Size
!width="4%"| Value
!width="65%"| Notes
|-
| RESPONSE_STREAM_ID
| <code>uint64</code>
|
| This value is picked by the server and will always be one ID higher than the initial stream ID.
|-
| SERVICE_TYPE_REQUEST
| <code>uint8</code>
|
| Services accepting external connections.
|-
|
|
| <code>0x01</code>
| Gossip service.
|-
|
|
| <code>0x02</code>
| Raft service.
|-
|
|
| <code>0x03</code>
| Kvstore service.
|-
|
|
| <code>0x04</code>
| Messaging service.
|-
|
|
| <code>0x05</code>
| System service.
|-
| SERVICE_TYPE_RESPONSE
| <code>uint8</code>
| <code>0x00</code>
| Service is ready and accepting connections
|-
|
|
| <code>0x01</code>
| Service is not available, retryable, no error
|-
|
|
| <code>0x02</code>
| Service is not available, retryable, incoming error
|-
|
|
| <code>0x03</code>
| Service is not available, non-retryable, no error
|-
|
|
| <code>0x04</code>
| Service is not available, non-retryable, incoming error
|-
| SERVICE_STATUS_REQUEST
| <code>uint8</code>
|
| Used to request the status of a specific service. See SERVICE_TYPE_REQUEST for which values to use.
|-
| SERVICE_STATUS_RESPONSE
| <code>uint8</code>
| <code>0x00</code>
| Service is healthy and accepting connections.
|-
|
|
| <code>0x01</code>
| Service is unhealthy.
|-
| PAYLOAD_TYPE_INCOMING
| <code>uint8</code>
|
| Payload types are specific to each service, see the service implementations for details.
|-
| PAYLOAD_CHECKSUM_INCOMING
| <code>uint32</code>
|
| This is the incoming checksum of the pending payload.
|-
| PAYLOAD_SIZE_INCOMING
| <code>uint64</code>
|
| The size of the incoming payload in bytes.
|-
| LATENCY_TEST_START
| <code>uint64</code>
|
| The identifier of the latency test.
|-
| LATENCY_TEST_STOP
| <code>uint64</code>
|
| The identifier of the latency test.
|-
| HEARTBEAT_FREQUENCY_SETTING_INCOMING
| <code>uint8</code>
|
| The frequency, in seconds, of expected heartbeat messages.
|}
== Contexts ==
= todo =
== Versioning ==
= todo =
== Drawbacks ==
...
== Rationale and Alternatives ==
...
= Explain it to folk outside your team =
Audience: PMs, doc writers, end-users, Pleiades contributors in other areas of the project.
= Unresolved questions =
Audience: all participants to the RFC review.
a8726070e12e01889518d5422443b8f71a8aed16
504
503
2023-11-05T04:38:54Z
Sienna
2
wikitext
text/x-wiki
'''Feature Name:''' Pleiades Wire Protocol v1.0 (rtRPC)
'''Status:''' draft
'''Start Date:''' 23 August 2023
'''Authors:''' Sienna Lloyd [mailto:sienna@linux.com sienna@r3t.io]
= Summary =
Pleiades v3's internal architecture is removing a dependency on gRPC and other RPC frameworks to gain some technical independence and flexibility. This also comes with an added burden of needing to define a dedicated wire protocol. This document defines the layout of the v1 wire protocol, and how to successfully implement it. Generated clients and servers are well outside the scope of this document; however this protocol enables easy code generation.
<blockquote>[!info] The intent of this protocol is to be simple enough that novice systems programmers can implement clients while also being powerful enough to meet long-term needs.
</blockquote>
= Motivation =
<blockquote>[!tldr] gRPC sucks
</blockquote>
gRPC is slow, heavy, and is focused on supporting their largest consumers. The technology is old, stale, and in some languages, such as Go, completely hardcoded implementations are the defaults. Other RPC frameworks are either minimal in their support, or are lacking substantial features. Pleiades v3 internal architecture can't continue to progress while also maintaining ties to gRPC.
The technical motivation is ''less is more''. To effectively meet the performance requirements of Pleiades at scale, the networking protocol must also be performant. gRPC is HTTP-based, where as the v1 wire protocol will be UDP-based with QUIC. This means Pleiades nodes and clients can connect and immediately send data without roundtripping via 0-RTT, as well as bidirectional streaming over individual streams, muxing over multiple streams, or any other pattern.
<blockquote>[!info] For more information on QUIC, read [https://www.rfc-editor.org/rfc/rfc9000.html RFC 9000]. It's long but worth the read.
</blockquote>
By removing the dependency on gRPC, we also free up access to Pleiades overall, and only limit the access via QUIC and protocol buffers. While this new format will remove gRPC, it does continue with protocol buffers. Protocol buffers are an industry standard of data encoding and changing away from them only makes data interfaces more difficult.
This design also allows Pleiades to have a very simple, but heavily muxed service implementation for RPC-style services.
= Technical design =
The Pleiades Wire Protocol v1 (PWP) is simple in architecture, but detailed in implementation. As an important piece of context, PWP is based around the concept of streaming, instead of call and response. Stream programming is a different functional architecture than call and response, and as such different architectural decisions are made.
Generally, there are a few core constructs in PWP:
* ''stream pairs''
* ''magic bytes''
* ''payloads''
* ''contexts''
Stream pairs are sets of two bidi streams within a single QUIC connection. The first stream allows for negotiation of the second stream. As QUIC supports <math>2^{64}-1</math> streams, it's much simpler to set up separate request and response streams than trying to mux over a single stream; re: a ''stream pair''.
Magic bytes are fairly straightforward and provide basic context within a stream used as control opportunities.
Payloads are just that, and come in two forms: ''metadata bytes'' and ''messages''. Metadata bytes are short, simple <code>uintX</code>-style metadata response payloads that answer simple questions. Messages are protobuf encoding payloads that contain application-level requests and responses.
Contexts are administrative references used to understand and debug a stream pair. Contexts are generally abstract, but can be concretely implemented.
With these core constructs, an entire RPC-style contract can be built with minimal effort on top of QUIC. QUIC provides the base streaming abstractions for us, and there's very little that we have to do to set that up. A key takeaway about the core technical design is the distinct lack of framing. Framing involves a significant overhead and expects significant inconsistencies in the transport. As QUIC provides ordered streams with retry buffers, Pleiades is guaranteed to get composited messages in order. Structured framing provides no real value for high maintenance costs. However, frame synchronization is a key architectural takeaway that is being kept.
<blockquote>[!info] For more information on framing, see the Wikipedia article on [https://en.wikipedia.org/wiki/Frame_(networking) frame design].
</blockquote>
Frame synchronization in PWP is less about frames and more about stream synchronization. Ultimately, the difference between frame synchronization and stream synchronization is the chunk of data which is parsed. For more classical framing, such as ethernet or TCP frames, there are standard packet transmission sizes that inform the reader of how much information to read, parse and return. Framing requires larger buffers, more memory allocation, and more processor cycles to manage. In leaky or inconsistent environments, this is a reasonable tradeoff, but the value of QUIC is that it abstracts this for us at the lowest levels. To a client, a QUIC stream is a guaranteed delivery data stream - QUIC a hardline into a switch vs TCP's wifi connection.
As PWP is a streaming protocol, not a call and response protocol, the frameless design allows for ridiculously small signals to be transmitted across the wire but provide massive control contexts. As an example, with only 16 total bytes transmitted, a server and client will have established an entire service construct ready for application-level messages to be passed back and forth. If we include version checking, it adds an extra 2 bytes, bringing the total byte transmission to be 18. As a comparison, just the frame headers of HTTP/2 requires 18 total bytes without the payload, there's no inferable context, and the call and response has been completed. HTTP/3 uses the same semantics, however it is implemented on top of QUIC instead of native TCP.
The value of using a streaming protocol is through timing and per-client throughput decisions. For example, a client could open a connection, create the initial stream, send the <code>HELLO</code> magic byte, wait for the response stream magic byte and it's respective payload, send the service type request, wait for the response, and continue operations in a synchronous fashion. There is nothing wrong with that client design, and it would work well for mobile devices or low-end clients with performance limitations. However, a client could do everything from opening the connection to the first RPC message without ever receiving a response from the server. This allows for immediate communication at transmission speeds, and all the client has to do is operate on the order of the payloads it receives and it will have achieved the same end result, but in a fraction of the time.
<blockquote>[!tldr] Context is a construct from graph computing. Context is the localized relevance of something as it relates to a command or operation. Contexts in PWP are set by the magic bytes, and can change the set of operations an implementing client is using. ## Stream Pairs
</blockquote>
<blockquote>[!todo] Finish this section
</blockquote>
== Magic Bytes ==
Magic bytes are just contextual bytes of information that help clients and servers understand the varying states of a stream. Magic bytes are strictly <code>uint8</code> values that represent the state of an overall stream. Below is the table of magic bytes for PWP v1.0.
{|
!width="2%"| Byte
!width="11%"| Usage
!width="5%"| Sender
!width="74%"| Notes
!width="6%"| Metadata Payload Size
|-
| <code>0x01</code>
| <code>HELLO</code>
| Client
| Initial byte sent on a new stream connection, and is purely for connectivity verification. It is sent by the connecting client.
|
|-
| <code>0x02</code>
| <code>RESPONSE_STREAM_ID_INCOMING</code>
| Server
| Sent by the server immediately after a <code>HELLO</code> byte has been received.
| <code>uint64</code>
|-
| <code>0x03</code>
| <code>STREAM_SETUP_COMPLETE</code>
| Client
| Sent from the client to the server once the request and response streams have been established. Once this byte is received by the client, service negotiation can begin. At this point, the application-level base connection has been established.
|
|-
| <code>0x05</code>
| <code>SERVER_VERSION_REQUEST</code>
| Client
| Sent by the client to the server to verify version compatibility.
| <code>uint8</code>
|-
| <code>0x06</code>
| <code>SERVER_VERSION_RESPONSE</code>
| Server
| Response from the server containing the maximally supported protocol version.
| <code>uint8</code>
|-
| <code>0x10</code>
| <code>SERVICE_TYPE_REQUEST</code>
| Client
| A request sent by the client for a specific service connection.
| <code>uint8</code>
|-
| <code>0x11</code>
| <code>SERVICE_TYPE_RESPONSE</code>
| Server
| Response sent by the server verifying the service connection.
| <code>uint8</code>
|-
| <code>0x12</code>
| <code>SERVICE_STATUS_REQUEST</code>
| Client
| Request sent by the client to the server requesting a healthcheck of the service.
| <code>uint8</code>
|-
| <code>0x13</code>
| <code>SERVICE_STATUS_RESPONSE</code>
| Server
| Response from the server containing the healthcheck byte from the requested service.
| <code>uint8</code>
|-
| <code>0x20</code>
| <code>PAYLOAD_TYPE_INCOMING</code>
| Client or Server
| Prelude byte for an incoming message specifying the type of message.
| <code>uint8</code>
|-
| <code>0x21</code>
| <code>PAYLOAD_CHECKSUM_INCOMING</code>
| Client or Server
| Prelude byte for the CRC-32C checksum of a message.
| <code>uint32</code>
|-
| <code>0x21</code>
| <code>PAYLOAD_SIZE_INCOMING</code>
| Client or Server
| Prelude byte specifying the size of the incoming message.
| <code>uint64</code>
|-
| <code>0xD0</code>
| <code>LATENCY_TEST_START</code>
| Client or Server
| Notification of an incoming latency test.
| <code>uint64</code>
|-
| <code>0xD1</code>
| <code>LATENCY_TEST_STOP</code>
| Client or Server
| Notification of the end of an established latency test.
| <code>uint64</code>
|-
| <code>0xE0</code>
| <code>HEARTBEAT</code>
| Client or Server
| Heartbeat, or keepalive, ping.
|
|-
| <code>0xE1</code>
| <code>HEARTBEAT_FREQUENCY_SETTING_INCOMING</code>
| Client
| Sent by the client to the server to notify the frequency change of the heartbeat messages.
| <code>uint8</code>
|-
| <code>0xFF</code>
| <code>CLOSE</code>
| Client or Server
| Final byte sent in a stream to notify of stream closure.
|
|}
Each of these magic bytes represents a different set of information. For the most part, it should be easy to understand how they work.
== Payloads ==
These payloads are split into two core types: metadata and messages.
Metadata payloads are specific to each of the magic bytes as simple responses to simple requests. The max size of a metadata payload is the value of a <code>uint64</code>, or 8 bytes.
Messages are application-specific payloads used in requests, responses, and application-level stream messages. Messages are encoded protocol buffers for specific bundles services.
=== Metadata Payloads ===
These payloads create, enrich, or change the contexts of a stream pair for clients and servers. Some values are dictated by the protocol, whereas some
{|
!width="24%"| Payload
!width="5%"| Size
!width="4%"| Value
!width="65%"| Notes
|-
| RESPONSE_STREAM_ID
| <code>uint64</code>
|
| This value is picked by the server and will always be one ID higher than the initial stream ID.
|-
| SERVICE_TYPE_REQUEST
| <code>uint8</code>
|
| Services accepting external connections.
|-
|
|
| <code>0x01</code>
| Gossip service.
|-
|
|
| <code>0x02</code>
| Raft service.
|-
|
|
| <code>0x03</code>
| Kvstore service.
|-
|
|
| <code>0x04</code>
| Messaging service.
|-
|
|
| <code>0x05</code>
| System service.
|-
| SERVICE_TYPE_RESPONSE
| <code>uint8</code>
| <code>0x00</code>
| Service is ready and accepting connections
|-
|
|
| <code>0x01</code>
| Service is not available, retryable, no error
|-
|
|
| <code>0x02</code>
| Service is not available, retryable, incoming error
|-
|
|
| <code>0x03</code>
| Service is not available, non-retryable, no error
|-
|
|
| <code>0x04</code>
| Service is not available, non-retryable, incoming error
|-
| SERVICE_STATUS_REQUEST
| <code>uint8</code>
|
| Used to request the status of a specific service. See SERVICE_TYPE_REQUEST for which values to use.
|-
| SERVICE_STATUS_RESPONSE
| <code>uint8</code>
| <code>0x00</code>
| Service is healthy and accepting connections.
|-
|
|
| <code>0x01</code>
| Service is unhealthy.
|-
| PAYLOAD_TYPE_INCOMING
| <code>uint8</code>
|
| Payload types are specific to each service, see the service implementations for details.
|-
| PAYLOAD_CHECKSUM_INCOMING
| <code>uint32</code>
|
| This is the incoming checksum of the pending payload.
|-
| PAYLOAD_SIZE_INCOMING
| <code>uint64</code>
|
| The size of the incoming payload in bytes.
|-
| LATENCY_TEST_START
| <code>uint64</code>
|
| The identifier of the latency test.
|-
| LATENCY_TEST_STOP
| <code>uint64</code>
|
| The identifier of the latency test.
|-
| HEARTBEAT_FREQUENCY_SETTING_INCOMING
| <code>uint8</code>
|
| The frequency, in seconds, of expected heartbeat messages.
|}
== Contexts ==
= todo =
== Versioning ==
= todo =
== Drawbacks ==
...
== Rationale and Alternatives ==
...
= Explain it to folk outside your team =
Audience: PMs, doc writers, end-users, Pleiades contributors in other areas of the project.
= Unresolved questions =
Audience: all participants to the RFC review.
6f22f310cdb0fe60160da717947c6117523aff2c
505
504
2023-11-05T04:39:32Z
Sienna
2
wikitext
text/x-wiki
'''Feature Name:''' Pleiades Wire Protocol v1.0 (rtRPC)
'''Status:''' draft
'''Start Date:''' 23 August 2023
'''Authors:''' Sienna Lloyd [mailto:sienna@linux.com sienna@r3t.io]
= Summary =
Pleiades v3's internal architecture is removing a dependency on gRPC and other RPC frameworks to gain some technical independence and flexibility. This also comes with an added burden of needing to define a dedicated wire protocol. This document defines the layout of the v1 wire protocol, and how to successfully implement it. Generated clients and servers are well outside the scope of this document; however this protocol enables easy code generation.
<blockquote>[!info] The intent of this protocol is to be simple enough that novice systems programmers can implement clients while also being powerful enough to meet long-term needs.
</blockquote>
= Motivation =
<blockquote>[!tldr] gRPC sucks</blockquote>
gRPC is slow, heavy, and is focused on supporting their largest consumers. The technology is old, stale, and in some languages, such as Go, completely hardcoded implementations are the defaults. Other RPC frameworks are either minimal in their support, or are lacking substantial features. Pleiades v3 internal architecture can't continue to progress while also maintaining ties to gRPC.
The technical motivation is ''less is more''. To effectively meet the performance requirements of Pleiades at scale, the networking protocol must also be performant. gRPC is HTTP-based, where as the v1 wire protocol will be UDP-based with QUIC. This means Pleiades nodes and clients can connect and immediately send data without roundtripping via 0-RTT, as well as bidirectional streaming over individual streams, muxing over multiple streams, or any other pattern.
<blockquote>[!info] For more information on QUIC, read [https://www.rfc-editor.org/rfc/rfc9000.html RFC 9000]. It's long but worth the read.
</blockquote>
By removing the dependency on gRPC, we also free up access to Pleiades overall, and only limit the access via QUIC and protocol buffers. While this new format will remove gRPC, it does continue with protocol buffers. Protocol buffers are an industry standard of data encoding and changing away from them only makes data interfaces more difficult.
This design also allows Pleiades to have a very simple, but heavily muxed service implementation for RPC-style services.
= Technical design =
The Pleiades Wire Protocol v1 (PWP) is simple in architecture, but detailed in implementation. As an important piece of context, PWP is based around the concept of streaming, instead of call and response. Stream programming is a different functional architecture than call and response, and as such different architectural decisions are made.
Generally, there are a few core constructs in PWP:
* ''stream pairs''
* ''magic bytes''
* ''payloads''
* ''contexts''
Stream pairs are sets of two bidi streams within a single QUIC connection. The first stream allows for negotiation of the second stream. As QUIC supports <math>2^{64}-1</math> streams, it's much simpler to set up separate request and response streams than trying to mux over a single stream; re: a ''stream pair''.
Magic bytes are fairly straightforward and provide basic context within a stream used as control opportunities.
Payloads are just that, and come in two forms: ''metadata bytes'' and ''messages''. Metadata bytes are short, simple <code>uintX</code>-style metadata response payloads that answer simple questions. Messages are protobuf encoding payloads that contain application-level requests and responses.
Contexts are administrative references used to understand and debug a stream pair. Contexts are generally abstract, but can be concretely implemented.
With these core constructs, an entire RPC-style contract can be built with minimal effort on top of QUIC. QUIC provides the base streaming abstractions for us, and there's very little that we have to do to set that up. A key takeaway about the core technical design is the distinct lack of framing. Framing involves a significant overhead and expects significant inconsistencies in the transport. As QUIC provides ordered streams with retry buffers, Pleiades is guaranteed to get composited messages in order. Structured framing provides no real value for high maintenance costs. However, frame synchronization is a key architectural takeaway that is being kept.
<blockquote>[!info] For more information on framing, see the Wikipedia article on [https://en.wikipedia.org/wiki/Frame_(networking) frame design].
</blockquote>
Frame synchronization in PWP is less about frames and more about stream synchronization. Ultimately, the difference between frame synchronization and stream synchronization is the chunk of data which is parsed. For more classical framing, such as ethernet or TCP frames, there are standard packet transmission sizes that inform the reader of how much information to read, parse and return. Framing requires larger buffers, more memory allocation, and more processor cycles to manage. In leaky or inconsistent environments, this is a reasonable tradeoff, but the value of QUIC is that it abstracts this for us at the lowest levels. To a client, a QUIC stream is a guaranteed delivery data stream - QUIC a hardline into a switch vs TCP's wifi connection.
As PWP is a streaming protocol, not a call and response protocol, the frameless design allows for ridiculously small signals to be transmitted across the wire but provide massive control contexts. As an example, with only 16 total bytes transmitted, a server and client will have established an entire service construct ready for application-level messages to be passed back and forth. If we include version checking, it adds an extra 2 bytes, bringing the total byte transmission to be 18. As a comparison, just the frame headers of HTTP/2 requires 18 total bytes without the payload, there's no inferable context, and the call and response has been completed. HTTP/3 uses the same semantics, however it is implemented on top of QUIC instead of native TCP.
The value of using a streaming protocol is through timing and per-client throughput decisions. For example, a client could open a connection, create the initial stream, send the <code>HELLO</code> magic byte, wait for the response stream magic byte and it's respective payload, send the service type request, wait for the response, and continue operations in a synchronous fashion. There is nothing wrong with that client design, and it would work well for mobile devices or low-end clients with performance limitations. However, a client could do everything from opening the connection to the first RPC message without ever receiving a response from the server. This allows for immediate communication at transmission speeds, and all the client has to do is operate on the order of the payloads it receives and it will have achieved the same end result, but in a fraction of the time.
<blockquote>[!tldr] Context is a construct from graph computing. Context is the localized relevance of something as it relates to a command or operation. Contexts in PWP are set by the magic bytes, and can change the set of operations an implementing client is using. ## Stream Pairs
</blockquote>
<blockquote>[!todo] Finish this section
</blockquote>
== Magic Bytes ==
Magic bytes are just contextual bytes of information that help clients and servers understand the varying states of a stream. Magic bytes are strictly <code>uint8</code> values that represent the state of an overall stream. Below is the table of magic bytes for PWP v1.0.
{|
!width="2%"| Byte
!width="11%"| Usage
!width="5%"| Sender
!width="74%"| Notes
!width="6%"| Metadata Payload Size
|-
| <code>0x01</code>
| <code>HELLO</code>
| Client
| Initial byte sent on a new stream connection, and is purely for connectivity verification. It is sent by the connecting client.
|
|-
| <code>0x02</code>
| <code>RESPONSE_STREAM_ID_INCOMING</code>
| Server
| Sent by the server immediately after a <code>HELLO</code> byte has been received.
| <code>uint64</code>
|-
| <code>0x03</code>
| <code>STREAM_SETUP_COMPLETE</code>
| Client
| Sent from the client to the server once the request and response streams have been established. Once this byte is received by the client, service negotiation can begin. At this point, the application-level base connection has been established.
|
|-
| <code>0x05</code>
| <code>SERVER_VERSION_REQUEST</code>
| Client
| Sent by the client to the server to verify version compatibility.
| <code>uint8</code>
|-
| <code>0x06</code>
| <code>SERVER_VERSION_RESPONSE</code>
| Server
| Response from the server containing the maximally supported protocol version.
| <code>uint8</code>
|-
| <code>0x10</code>
| <code>SERVICE_TYPE_REQUEST</code>
| Client
| A request sent by the client for a specific service connection.
| <code>uint8</code>
|-
| <code>0x11</code>
| <code>SERVICE_TYPE_RESPONSE</code>
| Server
| Response sent by the server verifying the service connection.
| <code>uint8</code>
|-
| <code>0x12</code>
| <code>SERVICE_STATUS_REQUEST</code>
| Client
| Request sent by the client to the server requesting a healthcheck of the service.
| <code>uint8</code>
|-
| <code>0x13</code>
| <code>SERVICE_STATUS_RESPONSE</code>
| Server
| Response from the server containing the healthcheck byte from the requested service.
| <code>uint8</code>
|-
| <code>0x20</code>
| <code>PAYLOAD_TYPE_INCOMING</code>
| Client or Server
| Prelude byte for an incoming message specifying the type of message.
| <code>uint8</code>
|-
| <code>0x21</code>
| <code>PAYLOAD_CHECKSUM_INCOMING</code>
| Client or Server
| Prelude byte for the CRC-32C checksum of a message.
| <code>uint32</code>
|-
| <code>0x21</code>
| <code>PAYLOAD_SIZE_INCOMING</code>
| Client or Server
| Prelude byte specifying the size of the incoming message.
| <code>uint64</code>
|-
| <code>0xD0</code>
| <code>LATENCY_TEST_START</code>
| Client or Server
| Notification of an incoming latency test.
| <code>uint64</code>
|-
| <code>0xD1</code>
| <code>LATENCY_TEST_STOP</code>
| Client or Server
| Notification of the end of an established latency test.
| <code>uint64</code>
|-
| <code>0xE0</code>
| <code>HEARTBEAT</code>
| Client or Server
| Heartbeat, or keepalive, ping.
|
|-
| <code>0xE1</code>
| <code>HEARTBEAT_FREQUENCY_SETTING_INCOMING</code>
| Client
| Sent by the client to the server to notify the frequency change of the heartbeat messages.
| <code>uint8</code>
|-
| <code>0xFF</code>
| <code>CLOSE</code>
| Client or Server
| Final byte sent in a stream to notify of stream closure.
|
|}
Each of these magic bytes represents a different set of information. For the most part, it should be easy to understand how they work.
== Payloads ==
These payloads are split into two core types: metadata and messages.
Metadata payloads are specific to each of the magic bytes as simple responses to simple requests. The max size of a metadata payload is the value of a <code>uint64</code>, or 8 bytes.
Messages are application-specific payloads used in requests, responses, and application-level stream messages. Messages are encoded protocol buffers for specific bundles services.
=== Metadata Payloads ===
These payloads create, enrich, or change the contexts of a stream pair for clients and servers. Some values are dictated by the protocol, whereas some
{|
!width="24%"| Payload
!width="5%"| Size
!width="4%"| Value
!width="65%"| Notes
|-
| RESPONSE_STREAM_ID
| <code>uint64</code>
|
| This value is picked by the server and will always be one ID higher than the initial stream ID.
|-
| SERVICE_TYPE_REQUEST
| <code>uint8</code>
|
| Services accepting external connections.
|-
|
|
| <code>0x01</code>
| Gossip service.
|-
|
|
| <code>0x02</code>
| Raft service.
|-
|
|
| <code>0x03</code>
| Kvstore service.
|-
|
|
| <code>0x04</code>
| Messaging service.
|-
|
|
| <code>0x05</code>
| System service.
|-
| SERVICE_TYPE_RESPONSE
| <code>uint8</code>
| <code>0x00</code>
| Service is ready and accepting connections
|-
|
|
| <code>0x01</code>
| Service is not available, retryable, no error
|-
|
|
| <code>0x02</code>
| Service is not available, retryable, incoming error
|-
|
|
| <code>0x03</code>
| Service is not available, non-retryable, no error
|-
|
|
| <code>0x04</code>
| Service is not available, non-retryable, incoming error
|-
| SERVICE_STATUS_REQUEST
| <code>uint8</code>
|
| Used to request the status of a specific service. See SERVICE_TYPE_REQUEST for which values to use.
|-
| SERVICE_STATUS_RESPONSE
| <code>uint8</code>
| <code>0x00</code>
| Service is healthy and accepting connections.
|-
|
|
| <code>0x01</code>
| Service is unhealthy.
|-
| PAYLOAD_TYPE_INCOMING
| <code>uint8</code>
|
| Payload types are specific to each service, see the service implementations for details.
|-
| PAYLOAD_CHECKSUM_INCOMING
| <code>uint32</code>
|
| This is the incoming checksum of the pending payload.
|-
| PAYLOAD_SIZE_INCOMING
| <code>uint64</code>
|
| The size of the incoming payload in bytes.
|-
| LATENCY_TEST_START
| <code>uint64</code>
|
| The identifier of the latency test.
|-
| LATENCY_TEST_STOP
| <code>uint64</code>
|
| The identifier of the latency test.
|-
| HEARTBEAT_FREQUENCY_SETTING_INCOMING
| <code>uint8</code>
|
| The frequency, in seconds, of expected heartbeat messages.
|}
== Contexts ==
= todo =
== Versioning ==
= todo =
== Drawbacks ==
...
== Rationale and Alternatives ==
...
= Explain it to folk outside your team =
Audience: PMs, doc writers, end-users, Pleiades contributors in other areas of the project.
= Unresolved questions =
Audience: all participants to the RFC review.
9d659b36530484c18c55eace885ff3ed2a29d714
File:Logo.jpeg
6
255
506
2023-11-05T04:49:35Z
Sienna
2
wikitext
text/x-wiki
da39a3ee5e6b4b0d3255bfef95601890afd80709
Code of Conduct
0
256
508
2023-11-05T05:09:13Z
Sienna
2
Created page with "= Contributor Covenant Code of Conduct = == Our Pledge == We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation. We pledge to act..."
wikitext
text/x-wiki
= Contributor Covenant Code of Conduct =
== Our Pledge ==
We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community.
== Our Standards ==
Examples of behavior that contributes to a positive environment for our community include:
* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience
* Focusing on what is best not just for us as individuals, but for the overall community
Examples of unacceptable behavior include:
* The use of sexualized language or imagery, and sexual attention or advances of any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email address, without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a professional setting
== Enforcement Responsibilities ==
Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful.
Community leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation decisions when appropriate.
== Scope ==
This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces. Examples of representing our community include using an official e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event.
== Enforcement ==
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at [mailto:sienna@r3t.io sienna@r3t.io]. All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the reporter of any incident.
== Enforcement Guidelines ==
Community leaders will follow these Community Impact Guidelines in determining the consequences for any action they deem in violation of this Code of Conduct:
=== 1. Correction ===
'''Community Impact''': Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the community.
'''Consequence''': A private, written warning from community leaders, providing clarity around the nature of the violation and an explanation of why the behavior was inappropriate. A public apology may be requested.
=== 2. Warning ===
'''Community Impact''': A violation through a single incident or series of actions.
'''Consequence''': A warning with consequences for continued behavior. No interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes avoiding interactions in community spaces as well as external channels like social media. Violating these terms may lead to a temporary or permanent ban.
=== 3. Temporary Ban ===
'''Community Impact''': A serious violation of community standards, including sustained inappropriate behavior.
'''Consequence''': A temporary ban from any sort of interaction or public communication with the community for a specified period of time. No public or private interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban.
=== 4. Permanent Ban ===
'''Community Impact''': Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals.
'''Consequence''': A permanent ban from any sort of public interaction within the community.
== Attribution ==
This Code of Conduct is adapted from the [https://www.contributor-covenant.org Contributor Covenant], version 2.0, available at https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
Community Impact Guidelines were inspired by [https://github.com/mozilla/diversity Mozilla's code of conduct enforcement ladder].
For answers to common questions about this code of conduct, see the FAQ at https://www.contributor-covenant.org/faq. Translations are available at https://www.contributor-covenant.org/translations.
9f4e48b5502adfbc8cc7e4a76d6ee40d0441a54b
Disk layout
0
257
511
2023-11-19T00:56:56Z
71.33.220.226
0
Added data layout page
wikitext
text/x-wiki
== General Layout ==
Here is the overall hierarchy of data storage within Pleiades:
Monolithic KVStore -> KVRangeStore -> Raft Shard -> Raft Replica -> RocksDB -> Host
Ultimately, RocksDB is the core storage engine on which everything is built. To optimize data storage, and lower the amount of files we have to track, data for multiple Raft Replicas is colocated into a single RocksDB instance. However, each replica's data is split into it's own column family so we can support atomic writes across multiple replicas on the same host without contention.
There are a few optimizations we can make through namespacing and delineation that are specific to RocksDB. The first is through the column families, which provides isolation between the replicas. The second is through namespacing the Raft metadata into the <code>/raft</code> namespace and application data into <code>/data</code>. The third is through a Raft-specific optimization: linearization - all operations, while asynchronous, are completely linear. These 3 optimizations allow a host to have almost complete atomicity for pretty much every key - we still wrap each write in a transaction, however.
To make it easier on the brain, here's an easy way of understanding how things are laid out on disk. Keep in mind this representation is human-friendly, whereas the actual implementation uses column families & byte-specific layouts.
/shardId/raft/<keys> -> <values>
/shardId/data/<keys> -> <values>
Over time, this might change, but for the most part these locations are static.
== Raft Layout ==
While the namespaces for shardId are column families, the core namespace delimiters are actually for faster sorting of the keys.
/shardId/raft/vote -> vote
/shardId/raft/last_purged_log_id
/shardId/raft/logs/<index> -> logId
/shardId/raft/snapshots/metadata -> snapshotMetadata
/shardId/raft/snapshots/<id> -> <internalRepresentation?>
== Data Key Encodings ==
The key encodings for application data is fairly straightforward but also incredibly powerful for addressing. Generally, Pleiades supports tagging, where keys are tagged with specific values that allow for fast retrievals, descriptor storage, and other general aspects. For each binary key, there are 2 bytes of appended to the key (re: keys created by applications) are reserved for metadata that allow us to support complex key usages. The overall byte alignment looks like so:
[binary-key][delimiter, tag]
There are a few different types of delimiters that make decoding a bit easier and more consistent:
{| class="wikitable"
|+ Delimiters
|-
! Delimiter !! Value
|-
| Tag || <code>.</code>
|-
| Latest Version || <code>@</code>
|-
| Specific Version || <code>:</code>
|}
Generally, this allows us to support a couple different use cases: general tags, latest version, and specific versions. Regarding versioning, the latest version tag is a quick way for us to fetch the latest version of a tag, as well as potentially signalling a cascade of data updates to older key versions.
{| class="wikitable"
|+ Tags
|-
! Tag !! Value
|-
| Latest Version || <code>l</code>
|-
| Specific Version || <code>uint8</code>/<code>u8</code>
|-
| Key Value Pair Descriptor || <code>d</code>
|}
To understand how the last two bytes would logically look on disk, it would look something like this:
/shardId/data/<key>.d -> <keyValuePairDescriptor>
/shardId/data/<key>@l -> <bin-data>
/shardId/data/<key>:3 -> <bin-data>
/shardId/data/<key>:2 -> <bin-data>
/shardId/data/<key>:1 -> <bin-data>
As an important bit of information, <code>latest</code> is always version <code>255</code> as that is the maximum supported number of versions. Using the above example, there are 4 versions of the key, <code>[4, 3, 2, 1]</code>, with the <code>latest</code> tag being version 4. If they key is going to be updated, creating a 5th version, the order of operations looks like this:
# <code>key.d</code> is read so we can get key metadata, which contains the latest version information.
# <code>key@l</code> is read so we can get the current value.
# <code>key:4</code> is created from <code>key@l</code> (which is really just a rename)
# <code>key@l</code> is updated with the new binary data.
There is a specific scenario where an update will cause a cascading update. Updating the above example, let's say there are 255 existing versions of the key:
/shardId/data/<key>.d -> <keyValuePairDescriptor>
/shardId/data/<key>@l -> <bin-data>
/shardId/data/<key>:254 -> <bin-data>
/shardId/data/<key>:253 -> <bin-data>
/shardId/data/<key>:252 -> <bin-data>
If you issue a key update, a rollover occurs, where the oldest version drops off, and every existing key is "incremented" to ensure the relative key versions remain consistent. For example, version 254 becomes version 253, version 253 becomes version 252, etc. Overall there will be a total of 255 writes with 257 reads if a key is already at it's maximum version of 255. It's up to the user to determine how they want to handle this.
== Vacuuming ==
The hooks in the key value pair structs exist, but vacuum logic hasn't been implemented yet. The general idea of vacuuming the key store would garbage collect all older key versions that are not marked as vacuumable. Since we don't store versioned metadata, users would have to issue a read-then-write to update the flag. As none of this logic exists yet, suggestions are recommended!
One thing we might want to capture is keys which are always vacuumable (re: vacuumable flag set on initial put). This might allow us some optimizations, but to be determined!
2277d76ce2099c78020f1576711f2705a885a994